Volume Downsize

This runbook will show you how to perform a volume downsize. The usual operation is to extend a volume, but in some cases you might have over-dimensioned your volumes and might need to downsize your volumes, in order to reduce costs.

Scenario

Assume you have a StackGres cluster with:

  • Instances: 3
  • Namespace: ongres-db
  • Cluster name: ongres-db
  • Volume size: 20Gi
$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list

+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | Leader | running |  3 |           |
| ongres-db-1 | 10.0.0.10:7433 |        | running |  3 |         0 |
| ongres-db-2 | 10.0.6.9:7433  |        | running |  3 |         0 |
+-------------+----------------+--------+---------+----+-----------+

Verify the PVC’s:

$ kubectl get pvc -n ongres-db

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
distributedlogs-data-distributedlogs-0   Bound    pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1   50Gi       RWO            standard       162m
ongres-db-data-ongres-db-0               Bound    pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f   20Gi       RWO            gp2-data       11m
ongres-db-data-ongres-db-1               Bound    pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44   20Gi       RWO            gp2-data       10m
ongres-db-data-ongres-db-2               Bound    pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148   20Gi       RWO            gp2-data       4m47s

Assuming the disk size is over-dimensioned, and you need to perform a downsize to 15Gi.

Performing a Switchover

Perform a switchover to the pod with the higher index number (ongres-db-2).

Execute:

kubectl exec -it -n ongres-db ongres-db-0 -c patroni -- patronictl switchover

Master [ongres-db-0]:

Candidate ['ongres-db-1', 'ongres-db-2'] []: ongres-db-2

When should the switchover take place (e.g. 2021-01-15T16:40 )  [now]:

Current cluster topology
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | Leader | running |  3 |           |
| ongres-db-1 | 10.0.0.10:7433 |        | running |  3 |         0 |
| ongres-db-2 | 10.0.6.9:7433  |        | running |  3 |         0 |
+-------------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster ongres-db, demoting current master ongres-db-0? [y/N]:y


2021-01-15 15:41:11.93457 Successfully switched over to "ongres-db-2"
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 |        | stopped |    |   unknown |
| ongres-db-1 | 10.0.0.10:7433 |        | running |  3 |         0 |
| ongres-db-2 | 10.0.6.9:7433  | Leader | running |  3 |           |
+-------------+----------------+--------+---------+----+-----------+

Now, check the cluster state:

$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 |        | running |  4 |         0 |
| ongres-db-1 | 10.0.0.10:7433 |        | running |  4 |         0 |
| ongres-db-2 | 10.0.6.9:7433  | Leader | running |  4 |           |
+-------------+----------------+--------+---------+----+-----------+

Editing the SGCluster Definition

As the downsize is not a common situation, it is necessary to temporary remove the StackGres operator validating-webhook, so first, create a backup of the yaml manifest:

Execute:

kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io stackgres-operator -o yaml > validating-webhook-stackgres-operator.yaml

Now delete the StackGres operator validating-webhook exucting:

kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io stackgres-operator

WARNING: Note that removing the validating webhook might potentially lead to some error in the resources that you may have to solve manually later if after restarting the operator Pod any error arise during the update of the existing resources that the operator executes on bootstrap. Usually manual intervention is not needed, but you should be aware of this.

Now, edit the StackGres cluster volume definition to the new size:

kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/pods/persistentVolume/size", "value": "10Gi" }]'

You’ll get the following message:

sgcluster.stackgres.io/ongres-db patched

Now, if you check the events you will see an error like:

kubectl get events -n ongres-db

....
Failure executing: PATCH at: https://10.96.0.1/apis/apps/v1/namespaces/ongres-db/statefulsets/ongres-db. Message: StatefulSet.apps "ongres-db" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec, message=Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden, reason=FieldValueForbidden, additionalProperties={})], group=apps, kind=StatefulSet, name=ongres-db, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=StatefulSet.apps "ongres-db" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
....

This is expected because is forbidden to change the spec of a stateful set.

Delete the stateful set and let the StackGres operator recreate it:

$ kubectl delete sts -n ongres-db ongres-db --cascade=orphan

Important Note: Do not forget the parameter --cascade=orphan because this will keep the existing pods.

Verifying the StatefulSet

Verify that the stateful set now has the new volume size:

$ kubectl describe sts -n ongres-db ongres-db | grep -i capacity
  Capacity:      15Gi

At this moment it is recommended to resotre the StackGres operator validating-webhook:

kubectl create -f validating-webhook-stackgres-operator.yaml

Editing the replica size

Edit the replica size to 1:

$ kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/instances", "value": 1 }]'

Once you decrease the replicas, you’ll see something like:

$ kubectl get pods -n ongres-db
NAME                READY   STATUS    RESTARTS   AGE
distributedlogs-0   2/2     Running   0          3h4m
ongres-db-2         6/6     Running   0          27m

Deleting the Unused PVCs and PVs

Proceed to delete the unused PVCs ongres-db-data-ongres-db-0 and ongres-db-data-ongres-db-1:

$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-0
persistentvolumeclaim "ongres-db-data-ongres-db-0" deleted

$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-1
persistentvolumeclaim "ongres-db-data-ongres-db-1" deleted

This will release the persistent volumes and then you can proceed to delete them:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                              STORAGECLASS   REASON   AGE
pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148   20Gi       RWO            Retain           Bound      ongres-db/ongres-db-data-ongres-db-2               gp2-data                32m
pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1   50Gi       RWO            Delete           Bound      ongres-db/distributedlogs-data-distributedlogs-0   standard                3h10m
pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f   20Gi       RWO            Retain           Released   ongres-db/ongres-db-data-ongres-db-0               gp2-data                39m
pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44   20Gi       RWO            Retain           Released   ongres-db/ongres-db-data-ongres-db-1               gp2-data                38m

Delete the disks with Released status:

$ kubectl delete pv pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f
persistentvolume "pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f" deleted

$ kubectl delete pv pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44
persistentvolume "pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44" deleted

Increasing the Replica Size

Increase the replica size to 2:

$ kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/instances", "value": 2 }]'

Now, your cluster will have 2 pods:

$ kubectl get pods -n ongres-db
NAME                READY   STATUS    RESTARTS   AGE
distributedlogs-0   2/2     Running   0          3h15m
ongres-db-0         6/6     Running   0          49s
ongres-db-2         6/6     Running   0          37m

Check again the cluster state:

$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 |        | running |  4 |         0 |
| ongres-db-2 | 10.0.6.9:7433  | Leader | running |  4 |           |
+-------------+----------------+--------+---------+----+-----------+

And the new pod will have the new disk size:

$ kubectl get pvc -n ongres-db
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
distributedlogs-data-distributedlogs-0   Bound    pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1   50Gi       RWO            standard       3h17m
ongres-db-data-ongres-db-0               Bound    pvc-37d96872-b132-4a89-a579-d87f8cf1fa92   15Gi       RWO            gp2-data       2m47s
ongres-db-data-ongres-db-2               Bound    pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148   20Gi       RWO            gp2-data       39m

Performing a Switchover

Perform another switchover, this time to node ongres-db-0:

$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl switchover
Master [ongres-db-2]:
Candidate ['ongres-db-0'] []: ongres-db-0
When should the switchover take place (e.g. 2021-01-15T17:12 )  [now]:
Current cluster topology
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 |        | running |  4 |         0 |
| ongres-db-2 | 10.0.6.9:7433  | Leader | running |  4 |           |
+-------------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster ongres-db, demoting current master ongres-db-2? [y/N]: y
2021-01-15 16:12:57.14561 Successfully switched over to "ongres-db-0"
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 | Leader | running |  4 |           |
| ongres-db-2 | 10.0.6.9:7433  |        | stopped |    |   unknown |
+-------------+----------------+--------+---------+----+-----------+

This will delete the pod ongres-db-2 and create the pod ongres-db-1

NAME                READY   STATUS    RESTARTS   AGE
distributedlogs-0   2/2     Running   0          3h19m
ongres-db-0         6/6     Running   0          4m51s
ongres-db-1         6/6     Running   0          41s

You can proceed to delete the PVC and PV of ongres-db-2

$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-2
persistentvolumeclaim "ongres-db-data-ongres-db-2" deleted

$ kubectl delete pv pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148
persistentvolume "pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148" deleted

Now, your cluster will have the new, reduced disk size:

$ kubectl get pvc -n ongres-db
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
distributedlogs-data-distributedlogs-0   Bound    pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1   50Gi       RWO            standard       3h24m
ongres-db-data-ongres-db-0               Bound    pvc-37d96872-b132-4a89-a579-d87f8cf1fa92   15Gi       RWO            gp2-data       9m21s
ongres-db-data-ongres-db-1               Bound    pvc-46c1433b-26e8-422c-aecf-145b1bb5aac1   15Gi       RWO            gp2-data       5m11s

Last step

As you temporary removed the validating-webhook it is necessary to restart the StackGres Operator pod.

Execute:

kubectl delete pod -n stackgres -l app=stackgres-operator

Check the pod started successfully:

Execute:

kubectl get pod -n stackgres -l app=stackgres-operator

The output should be like:

NAME                                  READY   STATUS    RESTARTS   AGE
stackgres-operator-85df9c556c-c242s   1/1     Running   0          79s