Back to videos

Decoding Taylor Swift’s Ticketmaster Debacle

What can we learn from the Ticketmaster (Taylor Swift) Debacle? Ticketmaster experienced an unprecedented demand that resulted in their site crashing for many hours. If they had designed a reliable service with an escalator-like system instead of an elevator, this could have been avoided.
3 min
play_arrow
Summary

Let’s talk about the Ticketmaster (Taylor Swift) Debacle and what we can learn from it.

You may remember this incident where Ticketmaster tried to sell tickets for a Taylor Swift concert, and their site went down for hours.

They said that it happened due to the unprecedented demand.

To me, that’s nonsense because this situation could have been easily avoided if they had load tested their systems properly.

But I want to talk about a deeper underlying issue: The first job of a service is to protect itself.

You do that by putting a queue in front of your service, which acts as a buffer between the service and the incoming requests. Suppose your service can handle 500 requests per second. If 50,000 requests arrive, instead of crashing, it will show an error message or queue up the other 49500 requests while it serves the 500.

Had Ticketmaster used this mechanism, it’d have protected their service from crashing while ensuring that a portion of the demand was still being served.

Think of a queue as an escalator. It operates at a consistent pace and can handle a certain amount of demand.

In comparison, an elevator is like a service that does not handle variability in demand very well. During rush hour, elevators tend to get overwhelmed and stop functioning effectively.

So when designing a reliable service, try to create an escalator-like system instead of an elevator.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
2 min
Risks of Automation vs. Human Errors
Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.
2 min
What We Do at Shoreline (In 140 Seconds)
Shoreline helps on-call operators reduce incidents resulting in a better on-call experience and better availability for their customers.
2 min
About Shoreline’s Fleet-Wide Debugging and Repair
Shoreline enables highly targeted fleet-wide debugging and repair allowing you to debug across the fleet in about the same amount of time as an individual box.