Elastic Sharding Replica Management

Determine when your elastic search clusters have too few replicas per shard, and automatically kick off healing.

The problem

When we first deploy ElasticSearch, it’s easy to craft the indexes we need and ensure they’re nicely balanced across all the nodes in the cluster. But as we ingest more data, Elastic can start to slow down. In time, the Elastic cluster is crawling along, and dependent applications may start going offline.

How did we get here? How do we get out? Should we wait until catastrophic failure before we take action?

With all the other Ops tickets in play, it’s easy to not even discover there’s a problem until it’s too late: the application is offline or users have moved on to competitors’ tools. Oops.

The solution

This Shoreline runbook scans the ElasticSearch indexes for low-performing queries. When slow indexes are identified, Shoreline automatically adjusts the replica count for those indices, allowing them to self-heal, and bringing the index back up to peak performance.