SGShardedDbOps allows you to perform day-2 database operations on sharded clusters, including restarts, resharding, and security upgrades.
The
restartandsecurityUpgradeoperations are logically equivalent since the SGShardedCluster version is updated on any restart. These operations can also be performed without creating an SGShardedDbOps by using the rollout functionality, which allows the operator to automatically roll out Pod updates based on the cluster’s update strategy.
| Operation | Description | Use Case |
|---|---|---|
restart |
Rolling restart of all pods | Apply configuration changes, clear memory |
resharding |
Rebalance data across shards | After adding shards, optimize distribution |
securityUpgrade |
Upgrade security patches | Apply security fixes |
Restart all pods in the sharded cluster:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: cluster-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
Restarts pods without creating additional replicas. Faster but may cause brief unavailability:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: inplace-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
restart:
method: InPlace
Creates a new replica before restarting each pod, minimizing impact:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: reduced-impact-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
restart:
method: ReducedImpact
Restart only pods that require a restart (e.g., after configuration change):
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: pending-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
restart:
method: ReducedImpact
onlyPendingRestart: true
Resharding rebalances data distribution across shards. This is essential after adding new shards.
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: rebalance-shards
spec:
sgShardedCluster: my-sharded-cluster
op: resharding
resharding:
citus:
threshold: 0.1 # Rebalance if nodes differ by 10% in utilization
The threshold determines when rebalancing occurs based on utilization difference:
| Threshold | Behavior |
|---|---|
0.0 |
Always rebalance (aggressive) |
0.1 |
Rebalance if >10% difference |
0.2 |
Rebalance if >20% difference |
1.0 |
Never rebalance |
Move all data off specific shards before removal:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: drain-shards
spec:
sgShardedCluster: my-sharded-cluster
op: resharding
resharding:
citus:
drainOnly: true
Use a specific Citus rebalance strategy:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: custom-rebalance
spec:
sgShardedCluster: my-sharded-cluster
op: resharding
resharding:
citus:
threshold: 0.1
rebalanceStrategy: by_disk_size
Available strategies depend on Citus version:
by_shard_count: Balance number of shards (default)by_disk_size: Balance disk usageApply security patches without changing PostgreSQL version:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: security-upgrade
spec:
sgShardedCluster: my-sharded-cluster
op: securityUpgrade
securityUpgrade:
method: ReducedImpact
Schedule an operation for a future time:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: scheduled-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
runAt: "2024-01-20T03:00:00Z" # Run at 3 AM UTC
restart:
method: ReducedImpact
Set a maximum duration for the operation:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: restart-with-timeout
spec:
sgShardedCluster: my-sharded-cluster
op: restart
timeout: PT2H # Fail if not completed in 2 hours
restart:
method: ReducedImpact
Configure automatic retries on failure:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: restart-with-retry
spec:
sgShardedCluster: my-sharded-cluster
op: restart
maxRetries: 3
restart:
method: ReducedImpact
# List all operations
kubectl get sgshardeddbops
# View detailed status
kubectl get sgshardeddbops cluster-restart -o yaml
status:
conditions:
- type: Running
status: "True"
reason: OperationRunning
- type: Completed
status: "False"
- type: Failed
status: "False"
opStarted: "2024-01-15T10:00:00Z"
opRetries: 0
restart:
pendingToRestartSgClusters:
- my-sharded-cluster-shard1
restartedSgClusters:
- my-sharded-cluster-coord
- my-sharded-cluster-shard0
| Condition | Description |
|---|---|
Running |
Operation is in progress |
Completed |
Operation finished successfully |
Failed |
Operation failed |
OperationTimedOut |
Operation exceeded timeout |
kubectl get sgshardeddbops cluster-restart -w
Control where operation pods run:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: scheduled-maintenance
spec:
sgShardedCluster: my-sharded-cluster
op: restart
scheduling:
nodeSelector:
node-type: maintenance
tolerations:
- key: maintenance
operator: Exists
effect: NoSchedule
After adding shards, rebalance data:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: post-scale-rebalance
spec:
sgShardedCluster: my-sharded-cluster
op: resharding
resharding:
citus:
threshold: 0.0 # Force rebalance
Schedule restart during maintenance window:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: maintenance-restart
spec:
sgShardedCluster: my-sharded-cluster
op: restart
runAt: "2024-01-21T02:00:00Z"
timeout: PT4H
restart:
method: ReducedImpact
onlyPendingRestart: true
Apply urgent security update:
apiVersion: stackgres.io/v1
kind: SGShardedDbOps
metadata:
name: urgent-security-upgrade
spec:
sgShardedCluster: my-sharded-cluster
op: securityUpgrade
securityUpgrade:
method: InPlace # Faster for urgent patches
To cancel a running operation, delete the resource:
kubectl delete sgshardeddbops cluster-restart
Note: Cancellation may leave the cluster in an intermediate state. Review cluster status after cancellation.
runAt for maintenance windows