How to resize a disk in Kubernetes on AWS - Shoreline

Introduction

Kubernetes is an amazing way to deploy stateless applications. For example, it’s easy to scale up and scale down replicas of your application using a deployment.

However, stateful applications that involve disks (e.g., mysql, rabbitmq, prometheus, elasticsearch) are different. When a disk fills up, you can’t just create a copy, as the copied disk will also be full.

You often need to resize the disk, but how do you do this in Kubernetes? There are some challenges here:

How do we make sure the current data is intact?
How do we do this without corrupting the data as the application continues to write?

Below, we’ll answer these questions and discuss how to automatically detect and fix this problem.

After reading this post you’ll learn:

The resources involved in Kubernetes volume management.
How to resize a disk, including errors and failures you’ll need to handle
How to automate detecting a filling disk and then trigger a resize using cloudwatch alarms and AWS Lambda
Optionally, how you can automate this with Shoreline

Resizing disks with code

Let’s start with a brief discussion of the quirks of Kubernetes volume management. Something to note is the confusing nature of the resource names PersistentVolumeClaim (PVC) and PersistentVolume (PV). Note that neither is called a disk. What do these terms mean?

PersistentVolumeClaims declare the hard drive requirements for a Pod. For example, my Pod may declare that it needs a 20 GB drive.

Let’s list all of the PersistentVolumeClaims in our cluster:

<code-embed>ubuntu@ip-172-31-24-27:~/bin$ kubectl get pvc --all-namespaces

NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
monitoring test12c-prometheus-data-test12c-prometheus-0 Bound pvc-74ad6b6a-0b69-48bf-bd3c-9c69e2e04e40 16Gi RWO gp2 56d
monitoring test12c-prometheus-data-test12c-prometheus-1 Bound pvc-3044da8a-4ef7-4e35-8bff-e2c183efa595 16Gi RWO gp2 56d
shoreline2 elasticsearch-data-op-packs-es-default-0 Bound pvc-0e0d69c6-5d1b-4de3-8ee9-4a0922b80057 1Gi RWO gp2 43d
shoreline2 elasticsearch-data-op-packs-es-default-1 Bound pvc-90eefb06-59e0-42e9-935c-198965a5c283 1Gi RWO gp2 43d
shoreline2 elasticsearch-data-op-packs-es-default-2 Bound pvc-f2ff9645-57a0-4f3d-b0f7-db1c10a32739 1Gi RWO gp2 43d
shoreline3 data-kafka-0 Bound pvc-ce6738a0-ad33-4574-9c72-ffbda0bbfb8d 8Gi RWO gp2 8d
shoreline3 data-kafka-1 Bound pvc-63e68221-3154-48c3-ab45-f5a4f86a9e5d 8Gi RWO gp2 8d
shoreline3 data-kafka-2 Bound pvc-79da460b-1ace-4450-97a0-3329132f5b77 8Gi RWO gp2 8d
shoreline3 data-kafka-zookeeper-0 Bound pvc-94fa2588-d626-45fd-a67c-1fa076c4bd0b 8Gi RWO gp2 9d<code-embed>

Above, we can see each of PersistentVolumeClaims, their namespace, their capacity, and storage class. Note that all of the above have a status of “Bound”.

Kubernetes will fulfill this request by provisioning a PersistentVolume. When a PersistentVolumeClaim has a PersistentVolume, it is bound to that volume.

PersistentVolume represents the true volume that is connected to your Pod. It’s where the data really goes.

Here’s how to list your PersistentVolumes:

<code-embed>ubuntu@ip-172-31-24-27:~/bin$ kubectl get pv --all-namespaces

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0e0d69c6-5d1b-4de3-8ee9-4a0922b80057 1Gi RWO Delete Bound shoreline2/elasticsearch-data-op-packs-es-default-0 gp2 43d
pvc-3044da8a-4ef7-4e35-8bff-e2c183efa595 16Gi RWO Delete Bound monitoring/test12c-prometheus-data-test12c-prometheus-1 gp2 56d
pvc-63e68221-3154-48c3-ab45-f5a4f86a9e5d 8Gi RWO Delete Bound shoreline3/data-kafka-1 gp2 8d
pvc-74ad6b6a-0b69-48bf-bd3c-9c69e2e04e40 16Gi RWO Delete Bound monitoring/test12c-prometheus-data-test12c-prometheus-0 gp2 56d
pvc-79da460b-1ace-4450-97a0-3329132f5b77 8Gi RWO Delete Bound shoreline3/data-kafka-2 gp2 8d
pvc-90eefb06-59e0-42e9-935c-198965a5c283 1Gi RWO Delete Bound shoreline2/elasticsearch-data-op-packs-es-default-1 gp2 43d
pvc-94fa2588-d626-45fd-a67c-1fa076c4bd0b 8Gi RWO Delete Bound shoreline3/data-kafka-zookeeper-0 gp2 9d
pvc-ce6738a0-ad33-4574-9c72-ffbda0bbfb8d 8Gi RWO Delete Bound shoreline3/data-kafka-0 gp2 8d
pvc-f2ff9645-57a0-4f3d-b0f7-db1c10a32739 1Gi RWO Delete Bound shoreline2/elasticsearch-data-op-packs-es-default-2 gp2 43d<code-embed>

AllowVolumeExpansion must be set to true

AllowVolumeExpansion is a big gotcha when configuring volumes in Kubernetes. Note above that every PersistentVolumeClaim has a StorageClass, which defines key parameters for a type of volume such as performance. The key parameter that we need to check in this situation is allowVolumeExpansion. (Note: Volume expansion needs at least Kubernetes version 1.11 otherwise it is not allowed at all.)

If it’s false, then we can’t resize the volume. We need to make sure it is set to true. Here’s how to do that:

<code-embed>ubuntu@ip-172-31-24-27:~/bin$ kubectl describe storageclass gp2

Name: gp2
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,type=gp2
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none><code-embed>

Note that AllowVolumeExpansion for gp2 is unset. This is actually very common. Let’s set it to true.

<code-embed>ubuntu@ip-172-31-24-27:~/bin$ kubectl patch sc gp2 -p '{"allowVolumeExpansion": true}'
storageclass.storage.k8s.io/gp2 patched

ubuntu@ip-172-31-24-27:~/bin$ kubectl describe storageclass gp2
Name: gp2
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,type=gp2
AllowVolumeExpansion: True
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none><code-embed>

Also note that the underlying driver needs to support volume expansion. For example, Amazon EBS eventually added expansion support, called elastic volumes.

You can check if your driver supports expansion in this table.

Patching the PVC to change its size

Now let’s get to the matter at hand - resizing. To that, we need to adjust the PersistentVolumeClaim by changing its capacity. Once we do that, this will trigger a resize of the PersistentVolume bound to the claim.

This process is just like everything else in Kubernetes - we tell Kubernetes what we want, not how to do it. Underneath, Kubernetes will reach out the volume driver to resize the disk. In this case, that’s going to be the EBS API.

Let’s increase the volume of our Prometheus server from 16 GB to 32 GB:

<code-embed>ubuntu@ip-172-31-24-27:~/bin$ kubectl patch pvc test12c-prometheus-data-test12c-prometheus-0 -n monitoring -p '{"spec":{"resources":{"requests":{"storage":"32Gi"}}}}'
persistentvolumeclaim/test12c-prometheus-data-test12c-prometheus-0 patched<code-embed>

Let’s see if the resize event has gone through. Note, that you might not see anything. You’ll have to run this command in a loop - it’s asynchronous.

<code-embedubuntu@ip-172-31-24-27:~/bin$ kubectl get events -n monitoring

LAST SEEN TYPE REASON OBJECT MESSAGE
17s Normal FileSystemResizeSuccessful pod/test12c-prometheus-0 MountVolume.NodeExpandVolume succeeded for volume "pvc-74ad6b6a-0b69-48bf-bd3c-9c69e2e04e40"
17s Normal FileSystemResizeSuccessful persistentvolumeclaim/test12c-prometheus-data-test12c-prometheus-0 MountVolume.NodeExpandVolume succeeded for volume "pvc-74ad6b6a-0b69-48bf-bd3c-9c69e2e04e40"<code-embed>

As you can see, our disk was resized! Let’s do one final check to confirm it’s 32 GB.

Resize limitations and handling failures

Since resizing is an async operation, we don’t know how long it will take. The larger the disk and/or change in disk size, the longer the resize can take.That’s why we might need to keep checking for events that the resize has taken place.

Furthermore, we are limited in terms of how often we can resize. Amazon EC2 documentation states, “you must wait at least six hours and ensure that the volume is in the in-use or available state before you can modify the same volume.” If you hit this issue, you’ll get an error from AWS and then an error in Kubernetes. Also note that not all file systems support resize. Only XFS, Ext3, or Ext4 support automatic resize.

Automating detection and resize using CW Alarms + Lambda

Why automate? As most ops folks know, manually resizing a disk does not make sense in many situations. This creates a lot of work for the team, and can hurt the end user/customer experience. Here is an example of how to automate the work we did previously by using a scheduled CloudWatch event and a Lambda.

First let’s create a shell script that computes disk usage

Note we’ll need to specify which disk we want to go after

<code-embed>df -h | grep "$DISK_REGEX" | awk '{ print $5 }' | sed 's/.$//<code-embed>

Then let’s augment it to iterate over each of our pods

<code-embed>for pod in $(kubectl get pods -n $POD_NAMESPACE | grep Running | awk '{print $1}'); do
kubectl exec -it $pod -n $POD_NAMESPACE -- df -h | grep "$DISK_REGEX" | awk '{ print $5 }' | sed 's/.$//;
done<code-embed>

Then let’s add an if condition that if the disk is too full, we adjust the pvc to trigger a resize

<code-embed>for pod in $(kubectl get pods -n my-namespace | grep Running | awk '{print $1}'); do
CURR_SIZE=$(kubectl exec -it $pod -n $POD_NAMESPACE -- df -h | grep "$DISK_REGEX" | awk '{ print $5 }' | sed 's/.$//);
If [ $CURR_SIZE -ge $THRESHOLD ]; then
PATCH_JSON=$(echo '{"spec":{"resources":{"requests":{"storage":"'$NEW_PVC_SIZE$PVC_UNIT'"}}}}')
echo "attempting to patch $PVC with json: $PATCH_JSON"
kubectl patch pvc $PVC -n $POD_NAMESPACE -p $PATCH_JSO
fi
done<code-embed>

We’ll also need to loop over the result and keep checking b/c it’s an async call.

We’ll also need to handle errors.

<code-embed>for pod in $(kubectl get pods -n $POD_NAMESPACE | grep Running | awk '{print $1}'); do
CURR_SIZE=$(kubectl exec -it $pod -n $POD_NAMESPACE -- df -h | grep "$DISK_REGEX" | awk '{ print $5 }' | sed 's/.$//);
If [ $CURR_SIZE -ge $THRESHOLD ]; then
PATCH_JSON=$(echo '{"spec":{"resources":{"requests":{"storage":"'$NEW_PVC_SIZE$PVC_UNIT'"}}}}')
echo "attempting to patch $PVC with json: $PATCH_JSON"
kubectl patch pvc $PVC -n $POD_NAMESPACE -p $PATCH_JSON
for (( i=1; i<=$MAX_STATUS_CHECKS; i++ ))
do
echo "checking patch status, number of attempts $i"
LATEST_STATUS=$(kubectl get events -n $POD_NAMESPACE -o json | jq --arg POD_NAMESPACE $POD_NAMESPACE --arg PVC $PVC -c '[.items | .[] | select(.involvedObject.name==$PVC and .involvedObject.namespace==$POD_NAMESPACE and .involvedObject.kind=="PersistentVolumeClaim")] | last | {type: .type, reason: .reason, message: .message}')
STATUS_TYPE=`echo $LATEST_STATUS | jq -c '.type'`
echo "STATUS_TYPE=$STATUS_TYPE"
STATUS_REASON=`echo $LATEST_STATUS | jq -c '.reason'`
echo "STATUS_REASON=$STATUS_REASON"
if [ $STATUS_TYPE = "\"Normal\"" ] && [ $STATUS_REASON = "\"ProvisioningSucceeded\"" ]
then
echo "update pvc patch succeeded for pvc=$PVC with new size=$NEW_PVC_SIZE$PVC_UNIT"
exit 0
else
echo "update pvc patch failed for pvc=$PVC with status=$LATEST_STATUS"
fi
`sleep $STATUS_CHECK_WAIT_SEC`
done
echo "reached maximum patch status check limit, exiting script without successful patch."
fi
done<code-embed>

Next, let’s put this in a docker container. Follow this tutorial to wrap into the AWS Lambda container interface and build your container image. Set up an ECR repository and push your container to the repository. Follow this tutorial to create an AWS Lambda event that triggers a lambda function.

Unfortunately, this still doesn’t execute in parallel (i.e., this will get slower as you manage more containers). The other ramification of this is that your time to check between each container will continuously grow. In addition, you’ll need to integrate the container with your secrets propagation service. Worst case you can build your secrets into the container, but that’s insecure and will fail if you adjust your secrets. This is particularly important because you need to get your kubectl credentials to the Lambda.

Automate detection and resize with Shoreline

In our example above, we only resized a volume one time. But going forward, how do we know when another resize is necessary? This is actually difficult to determine; often the demands on our databases or other stateful systems are dynamic. We don’t know when we’ll get a traffic spike.

That means that we’ll have to constantly monitor. When a resize is needed, we’ll then need to go through the above operations and handle any errors that come up. We’ll need a runbook for this.

All of the issues we’ve just discussed are things that Shoreline has worked to address as a part of our platform. Within our platform we have Op Packs – prebuilt automations that will give you the metrics that you're going to track, the alarms that will fire, the actions that can be taken, and the scripts that will be run by the actions. You'll have the option of running these Op Packs manually on the shoreline platform, or turning on the automated version once you're comfortable.

Shoreline’s disk Op Pack automatically detects the need for a resize and kicks off the process. We can even use Terraform to configure the Op Pack. We’ll set it up to monitor all of our Prometheus servers, and when any of the disks get to 80% full, automatically add 10 GB of storage, up to 200 GB. It runs on GCP and AWS.

<code-embed>module "pvc_autoscale" {
source = "/pvc_autoscale"
pvc_regex = "prometheus-data"
disk_threshold = 80
increment = 10
max_size = 200
resource_query = "host | pod | namespace='prometheus' | app='prometheus'"
}<code-embed>

We just need to do a terraform apply to store this configuration into the cluster:

<code-embed>SHORELINE_URL=https://test.us.api.shoreline-test12.io terraform apply<code-embed>

This uses our verified terraform provider. With that configuration, Shoreline will install local control loops on each node, continuously searching for full disks. If a full disk is detected, the resize is automatically started.

We handle the asynchronous nature of the request and any failures that arise along the way. You won’t see this ticket again. Similarly, this can be applied to every stateful application you have - the disk resize issue is squashed across the fleet, in 8 lines of terraform.

Please reach out for a demo.

Code to resize a disk in Kubernetes on AWS