Today I'm thinking again about the "Children of the Magenta" lecture.

Dave Anderson

He is trying to convey to his students: there will be novel situations, or situations where events unfold rapidly. When that happens, your first reflex should be to cut the automation out of the loop, because for all the extra leverage it gives you, the one thing it lacks is a brain. You need to maintain the skill of using your brain.

Anyway, it's a good lecture. Aviation's routinely been several decades ahead of tech in terms of learning the rough lessons. It's worth learning their history.

Dave Anderson

The Children of the Magenta lecture: https://www.youtube.com/watch?v=WITLR_qSPXk . The quality of the AV isn't great due to age and restoration, but the contents is _well_ worth it.

aldeka

@danderson The number of Admiral Cloudberg essays on crashes and incidents that look like:
1) something weird/lopsided is going on with the plane
2) the autopilot silently corrects for it and nothing appears wrong
3) eventually the problem gets big enough that the autopilot can't compensate for it
4) autopilot shuts off and WHOOPS we're upside down now

Even if you _are_ trained to fly manually, this is still a kind of dependence that is bad!

Dave Anderson

@aldeka Yup, after correcting for automation dependence, automation UX is still a serious issue. AF447 is the poster child for that, automation that not only failed, but failed to convey to the pilots that it had failed, or that something had departed from nominal.

I also think about that a lot in relation to technologies like Kubernetes, which encourage ignorance of what's going on until it's so dire the automation can't hide it any more.

Lukas Grossar

@danderson The Glass Cage from Nicholas Carr is probably the one book that completely changed my understanding of automation and how it impacts humans using it. I'd say it's my primary driver for implementing automation that's as easy to understand as possible.

Dave Anderson

Incidentally you can find other videos of AA's Advanced Aircraft Maneuvering Program on the tubes, and they're worth it IMO. Some like techniques for escaping from ground bursts aren't directly transferable to other fields (though I'd argue they contain an important lesson about designing for the ability to operate near the edge of the envelope).

But the ones on situational awareness and task saturation are all about the psychology of incident response, and _directly_ translate to computing.

Dave Anderson

I would go so far as to say that every tech incident response whose post-mortem analysis includes words about confusion happening during an incident can be _directly_ explained and addressed by studying how aviation thinks about situation awareness, task saturation, and crew resource management.

I have in fact gone this far, and done a whole-ass training session on incident response with that precise angle.

Dave Anderson

Task saturation is what happens when the rate of input events exceeds the brain's ability to process them. The brain has an automatic defense mechanism for this, called "ignore the excess inputs".

That's task saturation in a nutshell, you get DoSed by the rate of incoming information, and discard as much as you need to avoid a meltdown.

What happens when you discard information about a situation? Loss of situation awareness.

Dave Anderson

Incident response isn't about observability, or tools, or oncall rotations, or automation. It's 100% about managing situation awareness in an emergent situation. If you look at ICS, the Incident Command System pioneered by firefighters and now used as an emergency response framework all over the world, its _entire_ purpose is to help maintain situation awareness and prevent responders from reaching task saturation. The actual response to the incident is a consequence of ICS enabling that.

Dave Anderson

That's also why ICS can feel weird to people who first see it, because it contains almost nothing about how to respond to incidents. It's entirely concerned with chains of command, task delegation, frameworks for managing the flow of information... But no ICS training will tell you how to deal with a wildfire, or a Kubernetes outage.

Because that's not its purpose, its purpose is to create the conditions for maintaining situation awareness, so that domain experts can work the problem.

Dave Anderson

(note: this thread has escaped from my usual bubble, and as such I've turned off notifications on it. If you're attempting to have a conversation or debate me, sorry, but I won't see it, and experience tells me a fedi crowd with enough strangers-to-me in it won't result in good conversations anyway.

If you like, think of this as an object lesson in resource management and avoiding task saturation! I am deliberately limiting a noisy input to maintain my processing capacity)

blinken

@danderson 100%. I have some experience with AIIMS, which I think is the Australian version of MACS/ICS, and the sort of things that we talk about in SRE incident response are absolutely less sophisticated.

AIIMS includes an element of attribution to decision making, which is good in multi-agency response but I'm not sure is helpful in non-life-threatening incidents. But the delegation and span of control ideas are great and I've had good results teaching this in SRE orgs I've been a part of.

Dan Ciruli

@danderson I know you've muted and I respect it (and I don't expect a response).

But thank you for posting. This was SUPER interesting and I think the abstract lesson applies MANY places today (from code copilots to chatbots to "self-driving" cars). Bookmarked!

Hazardius 🡗🡗🡗 🏳️‍🌈🏳️‍⚧️

@danderson I am a bit confused about the ability/defence mechanism described here, but I assume it's due to being #ActuallyAutistic . Regardless - it's amazing how brains can have various failsafes (even if they don't work on the same level for everyone).