Back to videos

Risks of Automation vs. Human Errors

Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.
2 min
play_arrow
Summary

“Automation is risky. Errors in the remediation code could worsen an outage.”

While that’s true, we also know that human error causes almost 5x more incidents than automation.

It’s because you can fix code, you can't fix people.

They come and go. Some have experience, some don't. And whoever happens to be on call is whoever happens to be on call.

People make mistakes. That's why when you're writing code, you don't just ship it.

It goes through testing, scripts, deployment, and all other processes.

But you don't have that opportunity when you're fixing something on call as you're dealing with it in the moment, under pressure.

That’s why the best way to reduce the risk in production ops is by doing more automation and leaving less in place for people.

Further, you can make automation less risky by using tools with circuit breakers that limit the number of times the automation runs, and that can deal with partial failures.

Basically, the tool must have the capability to understand the complexities of distributed systems.

So you can focus on automating just the individual issue that happens in the individual box.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
1 min
Shoreline Actionable Alarms
Shoreline Alarms identify issues with high specificity so that they are immediately actionable.
3 min
How to Boost Reliability Without Hiring More SREs
How can companies increase reliability without hiring an army of engineers?
2 min
How to Reduce On-Call Incidents
Shoreline's recent survey found that 48% of incidents are straightforward and repetitive while 55% of them escalate beyond the 1st line on call. If your on-call sucks, you must find a path to make incidents incidental.