The 2025 DORA Report w/special guest Fred Hebert
We chatted with Fred about last fall’s DORA report and what it means for those of us who are thinking about resilience and software engineering in the age of AI.
Building and Revising Adaptive Capacity Sharing for Technical Incident Response with Beth Adele Long
We chat about a seminal work in Resilience Engineering and Software with one of the authors.
Outsourcing and Resilience
We answer a reader question about how to activate resilience when teams are far away from each other (organizationally and geographically)
The Messy 9 and Coding with AI - A Panel Discussion
A panel discussion on how people actually use AI in their day-to-day software work, with special guests John Allspaw, Sheeri Cabral, Martin Smith and David Woods.
Going Solid
We talk about the seminal safety science paper and how it relates to our software world.
The Year in Resilience w/special guest John Allspaw
We chatted with John about the year in Resilience, incidents, and all the things.
Incident Status: On Hold w/special guest Will Gallego
We talked about whether there should be an “on hold” status for incidents, which also bled into talking about incident severity, and other lies we tell ourselves (vegetables?)
Complex Systems and the Messy Nine w/special guests Dave Woods and John Allspaw
We get to premier a new RE concept from Dave Woods. Come join us for a discussion on Resilience Engineering and the Messy 9.
All the things about Incident Command
We got a question about how to advocate for having an incident command/comms role.
Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
Join us as we go deep into the differences between root cause analysis and other perspectives on how to do post-incident learning and analysis.
First Stories/Second Stories
We’re talking about first stories and second stories, and their effects on people involved in incidents.
How (Not) to Introduce Resilience Engineering at Work with special guest Michelle Casey
We talked with Michelle Casey about her most recent blog post for the Resilience in SOftware Foundation, and what it takes to get people at work to latch onto some good ideas from the Resilience Engineering world.
How long should you wait after an incident to do your retro?
Someone wanted to know if we think software should be more like the FAA… we got a little sidetracked by action items, but there’s some advice in here, we think?
Lund University - Academic Theory and Practice
We talked with a distinguished panel of Lund University MSc HFSS alum and current students to learn more about the program and bridging theory and practice.
What’s the ROI on Reliability and Resilience work?
The dreaded question that we all get... and we have some answers for you, sort of?
Runbooks: the Good, Bad and Ugly w/special guest Andrew Hatch
We chatted with Andrew Hatch about runbooks and when they’re terrible, or when they might not suck so bad.
What is an incident? How come no one declare them?
We talked about the politics and trouble with declaring incidents, and how to improve how your organization handles them.
Chaos Engineering w/special guest Casey Rosenthal
We chatted with Casey Rosenthal about what chaos engineering is and how it’s different (or the same) than resilience engineering.
Burnout on Aisle 3
Colette And Clint talk burnout and why resilience engineering sees so much of it.
Resilience, Complexity, and Your Boss a collab w/Punk Rock Safety
We met with the guys at Punk Rock Safety to talk through how to do resilience engineering even if your boss doesn’t get it (yet).