A Kubernetes service is effectively a load-balancer across all the Pods, the instances of the process doing the work. If all the pods behind the service are unresponsive or terminated, this means this microservice is now offline.
A pod could go down for any number of reasons. Perhaps a new version is getting deployed. Or perhaps a node is having hardware issues and Kubernetes is moving this workload elsewhere in the cluster. Or maybe bad data or software failure led to a crash.
One pod going down is no big deal. Kubernetes can quickly spin up a new version, and our software is still highly available. We run many instances of the application for exactly this reason – to protect against hardware or software failure.
But if all the pods go down at once, this is bad news for the software. This microservice is now effectively offline, leading to customer impact, negative reviews, and tarnished brand reputation.
This Shoreline interactive runbook queries Kubernetes to find all the pods that support each service. It then fires a request at each pod to ensure the pod is functioning as expected. If the number of pods that respond correctly is zero, we know we have a problem.