Customers often ask us this question, primarily because it's expensive to store. Let's look at a couple common cases and see how to manage operational data efficiently.
Case 1: an ongoing event
If an event is going on right now, you want real-time data, maybe up to per-second granularity, to debug a live event without having to query each box separately. Here's how most companies mishandle it:
Even though production ops at its core is a distributed system, they handle the events by pulling all the data into one system, which:
At @Shoreline, we believe that the ground truth is in the boxes you manage. We treat the distributed system like a distributed system by pushing the questions you ask out to the nodes and pulling the answers back to have a real-time view per second on metrics, resources, and the output of Linux commands.
Case 2: Operational reporting
If you want to do operational reporting over, let's say, the last month, the data doesn't need to be as high grain. You need accurate, high-fidelity information for the issues that occur to keep track of trends or anomalies. But you don't care about the rest of the data. This is how we deal with it at Shoreline: (This is going to get a bit technical…so buckle up!)
We transform the raw data into the frequency-time domain using Wavelets. Wavelets is the same technology that underpins MPEG and JPEG. It gives us great compression – about 40x, if you could believe it, which enables:
All that geeking out aside, the basic point is that you need:
We believe that people shouldn't store operational data for a long time because we don't think people will look at it. But we make it efficient for them to do so. A 100 metric sampled/second costs us about $0.25/host/year. That's so inexpensive we don't even bother charging for it right now.