Back to videos

How to Reduce Alarm Noise

In any company, 50-80% of the alarms are noisy. Employees get trained to snooze these alarms – which isn’t always the right thing to do. Wouldn't it be better if you could easily see which are your top issues each week, and which alarms might be set incorrectly?
3 min
play_arrow
Summary

We saw 2,100 incidents for a beta customer with our new, free tool - Incident Insights.

Interestingly, he had expected this number to be 100.

It’s because his team was dealing with noisy alarms.

In fact, in any company, 50-80% of the alarms are noisy.

That’s why your employees have alert fatigue and get trained to snooze these alarms – which isn’t always the right thing to do.

Wouldn't it be better if you could easily see which are your top issues each week, and which alarms might be set incorrectly?

If the mean time to resolution is really short, it means someone just clicked OK to close the alert.

You probably need to get rid of that alarm or change its threshold.

Or you could use Shoreline as the recipient of your high-level alarms and convert them to precise ones.

For example, let’s say the CPU is high and…
- the input request rate is also high while the latency is not changing →  This means the system is operating normally. We don’t need to involve a human.
- the JVM process is garbage collecting →  Here, we should collect a heap dump and bounce the pod. We still do not need to wake anybody up.
- you have no idea why it’s happening → Now you need to wake someone up and give them a notebook that tells them all the diagnostics you've run in advance.

In all cases, you should start with noise reduction by:
- changing alarm thresholds (can't be too low, can't be too high)
- getting rid of alarms that don't do anything for you
- making your alarms precise by combining metrics, logs, and system states to figure out what's going on.
- automating away repetitive tasks.Back at AWS, I used to get a report every week from my team on all the incidents from the prior week to do exactly this work, but it took them a long time to generate.

So I'm really happy to introduce this new tool that gets you the same value for free and without any effort.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
1 min
Shoreline Customer Spotlight: TigerGraph
Automating mundane tasks and debugging were just a few of the DevOps requirements TigerGraph VP of Product and Innovation, Dr. Jay Yu, needed to scale in the cloud with his small team. Shoreline delivered.
1 min
Shoreline Fleetwide Debugging
Run a single command across the entire fleet to diagnose incidents more quickly.
1 min
Shoreline Fleetwide Repairs
Safely fix incidents across your entire fleet, with less overhead, and with fewer errors.