Log Processing at the Edge

Many production incidents are caused by issues that can be identified by analyzing log files. Unfortunately, centralized logging can be very expensive.

Major outage

The problem

Many production incidents are caused by issues that can be identified by analyzing log files. Unfortunately, centralized logging can be very expensive. Lots of data needs to be collected and processed to allow for continuous analysis of log files. In some cases, the benefits outweigh the costs. In other situations, users are looking for a lower cost way to attack problems. Shoreline provides a very low cost way to search log files on the box to both proactively alert for possible issues or to assist with diagnosis of an existing incident.

A popular use case for Shoreline is when an application has lost connection to an upstream service or database. While the application is still up, it is no longer functional and as a result, it should be restarted.

Although Kubernetes is good at identifying and restarting processes that have crashed, it does not know how to handle processes that are still up, but not functioning well. You might use a monitoring service to solve this issue, but moving log data is very costly. Licensing fees and network bandwidth can be costly and this gets increasingly expensive as the fleet grows.

The solution

Shoreline’s Log Processing Op Pack scans logs for conditions that you establish and uses regular expressions to identify when to pick up a log after certain conditions are met. An alert is then sent out either through a Slack message or the creation of a ticket. This Op pack is very flexible and configurable and can be applied to a wide range of incidents and applications. The Log Processing Op Pack differs from monitoring platforms in the following key ways:

  • Speed - Processing is at the edge; we can detect the issue in seconds with almost no latency.
  • Efficiency - There is no ingestion of logs; the filtering is done locally.
  • Scalability - Every agent watches its local box. As you keep adding nodes, you get more log capacity.
  • Affordability - Shoreline uses spare capacity on your machines, and does not push anything over the network, requiring many fewer resources.
  • This approach requires very little set-up or additional back-end services to install and manage.

This Op Pack can be applied to many challenging production incidents. Shoreline customers are already using variants of this Op to identify problems with Presto bad workers, Presto coordinators, gateway process health checks

Highlights

Customer experience impact
Causes degraded service
High
Occurrence frequency
Can happen daily
High
Shoreline time to repair
Time to diagnose manually
Security
Cost impact
Can dramatically reduce the cost of log monitoring
High
Time to repair manually
1 hour of time to diagnose
High

Related Solutions