Kafka Topic Management

When the length of your Kafka topic is too long, applications may begin to break.

Major outage

The problem

When the length of your Kafka topic is too long, you will fail to consume messages at the right rate. When messages aren’t consumed, applications may begin to break, with reports and transactions being the first to fail.

On the surface, this is not a difficult problem to diagnose. Close monitoring of metrics will tell you if messages are not being consumed. If the issue is caught early, then the pods simply need to be restarted. The true issue arises when you are unable to keep up with monitoring. The further you fall behind, the more things get out of sync, and the harder it is to fix. This will most likely lead to customer availability issues.

The solution

Shoreline’s Kafka OpPack detects Kafka lag and restarts consumer pods to remove lag. It works by allowing you to designate the group of pods that are consumers of the topic. Shoreline can capture metrics from a Kafka exporter or we can call Kafka to get the topic length.

Highlights

Customer experience impact
Reports and transactions fail
High
Occurrence frequency
Monthly for larger fleets
Medium
Shoreline time to repair
5 minutes
Low
Time to diagnose manually
Security
Cost impact
Time to repair manually
SRE time spent on diagnosis and repair
Medium

Related Solutions