Back to videos

Risks of Automation vs. Human Errors

Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.
2 min
play_arrow
Summary

“Automation is risky. Errors in the remediation code could worsen an outage.”

While that’s true, we also know that human error causes almost 5x more incidents than automation.

It’s because you can fix code, you can't fix people.

They come and go. Some have experience, some don't. And whoever happens to be on call is whoever happens to be on call.

People make mistakes. That's why when you're writing code, you don't just ship it.

It goes through testing, scripts, deployment, and all other processes.

But you don't have that opportunity when you're fixing something on call as you're dealing with it in the moment, under pressure.

That’s why the best way to reduce the risk in production ops is by doing more automation and leaving less in place for people.

Further, you can make automation less risky by using tools with circuit breakers that limit the number of times the automation runs, and that can deal with partial failures.

Basically, the tool must have the capability to understand the complexities of distributed systems.

So you can focus on automating just the individual issue that happens in the individual box.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
9 min
Datadog + Shoreline Integration Demo
See issues and act in real-time, directly from Datadog
3 min
Decoding Taylor Swift’s Ticketmaster Debacle
What can we learn from the Ticketmaster (Taylor Swift) Debacle? Ticketmaster experienced an unprecedented demand that resulted in their site crashing for many hours. If they had designed a reliable service with an escalator-like system instead of an elevator, this could have been avoided.
2 min
Why You Should Automate Production Ops
Most of the on-call issues are commonplace, which means they happen again and again. It’s important to automate these issues because it’s a one-time investment, doesn’t make mistakes, and stays with you forever.