This runbook will show you how to perform a volume downsize. The usual operation is to extend a volume, but in some cases you might have over-dimensioned your volumes and might need to downsize your volumes, in order to reduce costs.
Assume you have a StackGres cluster with:
3
ongres-db
ongres-db
20Gi
$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | Leader | running | 3 | |
| ongres-db-1 | 10.0.0.10:7433 | | running | 3 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | | running | 3 | 0 |
+-------------+----------------+--------+---------+----+-----------+
Verify the PVC’s:
$ kubectl get pvc -n ongres-db
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
distributedlogs-data-distributedlogs-0 Bound pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1 50Gi RWO standard 162m
ongres-db-data-ongres-db-0 Bound pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f 20Gi RWO gp2-data 11m
ongres-db-data-ongres-db-1 Bound pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44 20Gi RWO gp2-data 10m
ongres-db-data-ongres-db-2 Bound pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148 20Gi RWO gp2-data 4m47s
Assuming the disk size is over-dimensioned, and you need to perform a downsize to 15Gi
.
Perform a switchover to the pod with the higher index number (ongres-db-2
).
Execute:
kubectl exec -it -n ongres-db ongres-db-0 -c patroni -- patronictl switchover
Master [ongres-db-0]:
Candidate ['ongres-db-1', 'ongres-db-2'] []: ongres-db-2
When should the switchover take place (e.g. 2021-01-15T16:40 ) [now]:
Current cluster topology
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | Leader | running | 3 | |
| ongres-db-1 | 10.0.0.10:7433 | | running | 3 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | | running | 3 | 0 |
+-------------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster ongres-db, demoting current master ongres-db-0? [y/N]:y
2021-01-15 15:41:11.93457 Successfully switched over to "ongres-db-2"
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | | stopped | | unknown |
| ongres-db-1 | 10.0.0.10:7433 | | running | 3 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | Leader | running | 3 | |
+-------------+----------------+--------+---------+----+-----------+
Now, check the cluster state:
$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.11:7433 | | running | 4 | 0 |
| ongres-db-1 | 10.0.0.10:7433 | | running | 4 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | Leader | running | 4 | |
+-------------+----------------+--------+---------+----+-----------+
As the downsize is not a common situation, it is necessary to temporary remove the StackGres operator validating-webhook
, so first, create a backup of the yaml manifest:
Execute:
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io stackgres-operator -o yaml > validating-webhook-stackgres-operator.yaml
Now delete the StackGres operator validating-webhook
exucting:
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io stackgres-operator
WARNING: Note that removing the validating webhook might potentially lead to some error in the resources that you may have to solve manually later if after restarting the operator Pod any error arise during the update of the existing resources that the operator executes on bootstrap. Usually manual intervention is not needed, but you should be aware of this.
Now, edit the StackGres cluster volume definition to the new size:
kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/pods/persistentVolume/size", "value": "10Gi" }]'
You’ll get the following message:
sgcluster.stackgres.io/ongres-db patched
Now, if you check the events you will see an error like:
kubectl get events -n ongres-db
....
Failure executing: PATCH at: https://10.96.0.1/apis/apps/v1/namespaces/ongres-db/statefulsets/ongres-db. Message: StatefulSet.apps "ongres-db" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec, message=Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden, reason=FieldValueForbidden, additionalProperties={})], group=apps, kind=StatefulSet, name=ongres-db, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=StatefulSet.apps "ongres-db" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
....
This is expected because is forbidden to change the spec of a stateful set.
Delete the stateful set and let the StackGres operator recreate it:
$ kubectl delete sts -n ongres-db ongres-db --cascade=orphan
Important Note: Do not forget the parameter
--cascade=orphan
because this will keep the existing pods.
Verify that the stateful set now has the new volume size:
$ kubectl describe sts -n ongres-db ongres-db | grep -i capacity
Capacity: 15Gi
At this moment it is recommended to resotre the StackGres operator validating-webhook
:
kubectl create -f validating-webhook-stackgres-operator.yaml
Edit the replica size to 1
:
$ kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/instances", "value": 1 }]'
Once you decrease the replicas, you’ll see something like:
$ kubectl get pods -n ongres-db
NAME READY STATUS RESTARTS AGE
distributedlogs-0 2/2 Running 0 3h4m
ongres-db-2 6/6 Running 0 27m
Proceed to delete the unused PVCs ongres-db-data-ongres-db-0
and ongres-db-data-ongres-db-1
:
$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-0
persistentvolumeclaim "ongres-db-data-ongres-db-0" deleted
$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-1
persistentvolumeclaim "ongres-db-data-ongres-db-1" deleted
This will release the persistent volumes and then you can proceed to delete them:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148 20Gi RWO Retain Bound ongres-db/ongres-db-data-ongres-db-2 gp2-data 32m
pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1 50Gi RWO Delete Bound ongres-db/distributedlogs-data-distributedlogs-0 standard 3h10m
pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f 20Gi RWO Retain Released ongres-db/ongres-db-data-ongres-db-0 gp2-data 39m
pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44 20Gi RWO Retain Released ongres-db/ongres-db-data-ongres-db-1 gp2-data 38m
Delete the disks with Released
status:
$ kubectl delete pv pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f
persistentvolume "pvc-a2aa5198-c553-4e0d-a1e1-914669abb69f" deleted
$ kubectl delete pv pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44
persistentvolume "pvc-c724b2bf-cf17-4f57-a882-3a5da6947f44" deleted
Increase the replica size to 2
:
$ kubectl patch sgclusters -n ongres-db ongres-db --type='json' -p '[{ "op": "replace", "path": "/spec/instances", "value": 2 }]'
Now, your cluster will have 2 pods:
$ kubectl get pods -n ongres-db
NAME READY STATUS RESTARTS AGE
distributedlogs-0 2/2 Running 0 3h15m
ongres-db-0 6/6 Running 0 49s
ongres-db-2 6/6 Running 0 37m
Check again the cluster state:
$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl list
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 | | running | 4 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | Leader | running | 4 | |
+-------------+----------------+--------+---------+----+-----------+
And the new pod will have the new disk size:
$ kubectl get pvc -n ongres-db
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
distributedlogs-data-distributedlogs-0 Bound pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1 50Gi RWO standard 3h17m
ongres-db-data-ongres-db-0 Bound pvc-37d96872-b132-4a89-a579-d87f8cf1fa92 15Gi RWO gp2-data 2m47s
ongres-db-data-ongres-db-2 Bound pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148 20Gi RWO gp2-data 39m
Perform another switchover, this time to node ongres-db-0
:
$ kubectl exec -it -n ongres-db ongres-db-2 -c patroni -- patronictl switchover
Master [ongres-db-2]:
Candidate ['ongres-db-0'] []: ongres-db-0
When should the switchover take place (e.g. 2021-01-15T17:12 ) [now]:
Current cluster topology
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 | | running | 4 | 0 |
| ongres-db-2 | 10.0.6.9:7433 | Leader | running | 4 | |
+-------------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster ongres-db, demoting current master ongres-db-2? [y/N]: y
2021-01-15 16:12:57.14561 Successfully switched over to "ongres-db-0"
+ Cluster: ongres-db (6918002883456245883) -------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+--------+---------+----+-----------+
| ongres-db-0 | 10.0.7.12:7433 | Leader | running | 4 | |
| ongres-db-2 | 10.0.6.9:7433 | | stopped | | unknown |
+-------------+----------------+--------+---------+----+-----------+
This will delete the pod ongres-db-2
and create the pod ongres-db-1
NAME READY STATUS RESTARTS AGE
distributedlogs-0 2/2 Running 0 3h19m
ongres-db-0 6/6 Running 0 4m51s
ongres-db-1 6/6 Running 0 41s
You can proceed to delete the PVC and PV of ongres-db-2
$ kubectl delete pvc -n ongres-db ongres-db-data-ongres-db-2
persistentvolumeclaim "ongres-db-data-ongres-db-2" deleted
$ kubectl delete pv pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148
persistentvolume "pvc-5124b9d2-ec35-46d7-9eda-7543d9ed7148" deleted
Now, your cluster will have the new, reduced disk size:
$ kubectl get pvc -n ongres-db
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
distributedlogs-data-distributedlogs-0 Bound pvc-9bab7a68-a209-4d9a-93f7-871a217a28b1 50Gi RWO standard 3h24m
ongres-db-data-ongres-db-0 Bound pvc-37d96872-b132-4a89-a579-d87f8cf1fa92 15Gi RWO gp2-data 9m21s
ongres-db-data-ongres-db-1 Bound pvc-46c1433b-26e8-422c-aecf-145b1bb5aac1 15Gi RWO gp2-data 5m11s
As you temporary removed the validating-webhook
it is necessary to restart the StackGres Operator pod.
Execute:
kubectl delete pod -n stackgres -l app=stackgres-operator
Check the pod started successfully:
Execute:
kubectl get pod -n stackgres -l app=stackgres-operator
The output should be like:
NAME READY STATUS RESTARTS AGE
stackgres-operator-85df9c556c-c242s 1/1 Running 0 79s