Back to videos

Reliability Engineering: The Southwest Debacle

Because it's less expensive and quicker for passengers, Southwest operates on a point-to-point model. Any disruptions in one route affect the entire chain. But to engineer a reliable architecture, you need to balance cost versus reliability in an economically constrained way.
5 min
play_arrow
Summary

“Why is COVID better than Southwest Airlines? Because COVID is airborne.”

I read this on a handwritten sign while flying during the holidays.

The joke highlights the issue that Southwest faced, with almost 60% of their flights being grounded last December.

Many reasons are being cited for this issue, like weather, high demand, insufficient crew and planes, their outdated Sky Solver software, etc.

While there is some truth to all these explanations, my question is: Why did this happen to Southwest and not to other airlines?

The real difference between Southwest and other airlines that didn’t fall over is the technical architecture of how they operate.

Southwest operates on a point-to-point location model, which means that each flight is directly routed from one location to another, without connecting through a central hub.

So any disruptions in one route affect the entire chain.

On the other hand, most other airlines use a hub and spoke model, which is more resilient in case of failures.

This model allows the airlines to adopt an n+k approach, where they have n number of things that need to work and can tolerate k failures.

So they can have k reserve planes and crew available at the hub to ensure that there is a contingency in case of disruptions.

To do the same in the point-to-point model, you’d need to have k reserves at all locations, which isn’t economically feasible.

There are more nuances to this, such as the point-to-point model being less expensive for the airline and quicker for the passengers.

But to engineer a reliable architecture, you need to balance cost versus reliability in an economically constrained way.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
2 min
What We Do at Shoreline (In 140 Seconds)
Shoreline helps on-call operators reduce incidents resulting in a better on-call experience and better availability for their customers.
2:40 min
How to Do Continuous Improvement in Operations
Things that enabled me to do more with lower cloud computing costs
2 min
Our Community-Driven Library of Shared Automations
We're all sitting on the same infrastructure in Production Ops, but build our systems as if we’re starting new. Insane! That's why Shoreline Op Packs are available for free.