VM Certificate Rotation

Sooner or later every company gets bitten by expired certificates and when they do, it can cause a catastrophic outage.

Major outage

The problem

Sooner or later every company gets bitten by expired certificates and when they do, it can cause a catastrophic outage. It happens to everyone and makes the team look bad. Usually the whole ops team will spend a couple of weeks trying to make sure it never happens again. The problem is that there are many certificate providers and guaranteeing that certificates are propagated to all of your servers is tricky. Propagation can be particularly tricky with ephemeral servers that are frequently spun up and down. There are also many corner cases that can lead to an expired certificate in production.

The only way to reduce the risk of an expired certificate is to be able to check for expiry at your network endpoints and at the certificate files actually loaded on your servers. Load balancers can mask a small number of servers having an expired certificate. Alarms can also be overlooked, so ideally your alarm should be coupled with an automated approach to updating the certificate.

The solution

While there is no approach that will completely protect you against expired certificates, the Shoreline Certificate Rotation Op pack comes pretty close. First, Shoreline builds a real-time inventory of your fleet and pings every HTTP end-point on your fleet asking it when its certificate will expire. This is a good first step, but sometimes servers are hidden behind a load balancer. So as a second check, Shoreline runs a Linux command on every VM and every container, once an hour, also looking for soon to expire certificates.

Once an expiring certificate is identified, the next step is rotating the certificate. There are hundreds of certificate providers, so it's not practical for Shoreline to provide scripts for rotating certificates with each service provider. Shoreline does, however, provide an out-of-box script for Let’s Encrypt and is working to expand this to other common providers. The script will both provision and propagate the new certificates wherever they are needed. Our customers can use this script as a placeholder and template that can be updated or replaced with scripts they build for other certificate providers.

This Op Pack is a great way for Shoreline customers to decrease the risk of an expired certificate. At a minimum, it provides out-of-the-box alarms that look for expiring certificates in two separate ways. It does this in a very distributed way that addresses many of the common causes of missed expiring certificates. Once the alarm fires, it provides an out of the box solution for Let’s Encrypt and a placeholder example of how automations can be built for other certificate providers.

Highlights

Customer experience impact
Total outage
High
Occurrence frequency
1-2 times per year
Low
Shoreline time to repair
Time to diagnose manually
Security
Cost impact
Time to repair manually
A few hours
High

Related Solutions