How to Manage Your Operational Data Efficiently

"How long should we keep operational data?"

3 min

Summary

Customers often ask us this question, primarily because it's expensive to store. Let's look at a couple common cases and see how to manage operational data efficiently.

Case 1: an ongoing event

If an event is going on right now, you want real-time data, maybe up to per-second granularity, to debug a live event without having to query each box separately. Here's how most companies mishandle it:

Even though production ops at its core is a distributed system, they handle the events by pulling all the data into one system, which:

creates lag and inconsistency across your data silos,
prevents them from knowing what's going on right now, and
costs them a lot of money because they end up storing a lot of unneeded metrics.

At @Shoreline, we believe that the ground truth is in the boxes you manage. We treat the distributed system like a distributed system by pushing the questions you ask out to the nodes and pulling the answers back to have a real-time view per second on metrics, resources, and the output of Linux commands.

Case 2: Operational reporting

If you want to do operational reporting over, let's say, the last month, the data doesn't need to be as high grain. You need accurate, high-fidelity information for the issues that occur to keep track of trends or anomalies. But you don't care about the rest of the data. This is how we deal with it at Shoreline: (This is going to get a bit technical…so buckle up!)

We transform the raw data into the frequency-time domain using Wavelets. Wavelets is the same technology that underpins MPEG and JPEG. It gives us great compression – about 40x, if you could believe it, which enables:

high-resolution per second data
looking at the trends over time because you're looking at the curve to match if an event occurred in the past.

All that geeking out aside, the basic point is that you need:

live high-resolution data, and
a cost-effective way to retain it for a long time.

We believe that people shouldn't store operational data for a long time because we don't think people will look at it. But we make it efficient for them to do so. A 100 metric sampled/second costs us about $0.25/host/year. That's so inexpensive we don't even bother charging for it right now.

How to Manage Your Operational Data Efficiently

Summary

Transcript

View more Shoreline videos

Give us two weeks and we'll show you how to eliminate 30% of your incidents.