Kubernetes Debugging

There are a million things that can break within your Kubernetes cluster. Don’t waste time searching for that needle in the haystack.

Kubernetes
Most Popular

The problem

When you experience downtime or a service interruption, it’s easy to see that something is broken. But it's never easy to see if that broken element lives at the infrastructure level or application level. Even though your observability and monitoring tools can alert you that an issue exists, their alarms typically aren’t specific enough to nail down the cause. Searching for that needle in a haystack can be a long and arduous process, causing expensive delays on the way to fixing the issue. This is especially true for any business that operates multiple Kubernetes clusters and administers systems built on top of microservices. 

With so many things — in so many places — that could be broken, how do you efficiently search for the source of an issue and implement a solution?

For many, it’s an inefficient manual process. A problem occurs, and engineers then spend hours searching Stack Overflow for answers or trying a series of random commands to diagnose and repair. The worst part about this process? It’s an unreported waste of time. Most teams don’t account for time spent searching for a solution when looking holistically at how long it took to solve an issue. 

Sure, manually running a command to fix an issue doesn’t take long. But recalling which commands to run — and in what order — to manually diagnose an issue is the silent killer of your team’s productivity.

The solution

Shoreline’s three-part set of Kubernetes debugging notebooks quickly scans across pods, nodes, and services to automatically diagnose issues, implement a series of repairs, and get your systems and services back up and running in no time. The three distinct notebooks (for pods, nodes, and services) can be used separately or together — allowing engineers to begin debugging a service, for example, then easily switch to a node or pod debugging sequence if relevant for the issue at hand. 

With pre-built Kubernetes debugging sequences, engineers will benefit from expert-level Kubernetes knowledge and be able to make decisions and implement commands faster. Notebooks include the features listed below, yet are all completely customizable so operators can add and remove commands based on the specifics of their operating environment. 

Deployment / Pod Debugging Notebook 

  • Instantly check deployment resources and gather and assess core metrics (CPU, memory / disk usage, etc) at the software layer
  • Check for issues with individual pods within specific deployments 
  • Check the file system at the pod level as well as the load balancer
  • Automatically implement remediation steps to delete pods, trigger a rolling restart, adjust the labels of a deployment, or adjust a deployed version of the software (upgrade or revert)

Node Debugging Notebook 

  • Quickly review issues reported by Kubernetes, your cloud provider, and your machines themselves
  • Automatically assess the performance of all individual nodes (CPU, memory, root partition utilization) and all different pieces of software running together within a given environment (noisy neighbor issues) to isolate any areas of concern
  • Automatically implement remediation steps to update node labels and annotations, cordon nodes, and delete nodes.

Service Debugging Notebook

  • Conduct basic checks throughout an entire system — from the front-end service to your software and everything in between. 
  • Run a series of checks related to load balancing, performance latency, ingress systems, service endpoints, and pod performance. 
  • Automatically implement remediation steps to rotate certificates, adjust the rules of a security group to better manage traffic that’s able to access your service, or restart and patch the service to adjust its configuration. 

Shoreline’s Kubernetes debugging notebooks also automatically record the steps that were taken to assess and remediate a situation, removing the need for engineers to draft documentation and conduct lengthy handover meetings upon escalation. 

Highlights

Customer experience impact
Potential hours of downtime
High
Occurrence frequency
Until the root cause is identified
High
Shoreline time to repair
1-2 minutes
Low
Time to diagnose manually
Security
Cost impact
Time to repair manually
1-2 manual hours
High

Related Solutions