Usually maintenance task for Postgres requires some degree of disruption of the service that may convert, in some cases, into an issue for your business.
In this runbook we’ll demonstrate a technique to achieve Zero-Downtime for your maintenance task using StackGres and an external PgBouncer instance in transaction pooling mode.
IMPORTANT: When transaction pooling is used clients must not use any session-based features, since each transaction ends up in a different connection and thus gets a different session state. From version 1.21.0 PgBouncer added some support to allow use of prepared statement in transaction pooling mode.
In some cases Postgres have to be restarted in order to allow some parameter changes. This can be performed
manually using patronictl restart
command. In other cases you need to re-create the Pod in order for
some changes in the configuration to take place like after an upgrade of the StackGres operator or when any
change in the SGCluster modify the Pod. To re-create a Pod you may simply delete it. But in both cases, a
manual operation, may be dangerous and not taking into account the order of how to re-create each Pod of the
SGCluster. In general a restart, security upgrade or minor version upgrade SGDbOps are the way to go. They
will handle smoothly the operation performing a controlled switchover of the primary instance when needed.
And following the right order to update all of your Pods that needs to be updated.
The switchover operation is not perfect since it disconnect all the Postgres clients forcing them to return
an error to the final user. PgBouncer offer a mechanism to avoid that when configured with transaction
pooling mode with the PAUSE
and RESUME
commands. Sending the PAUSE
command with transaction pooling
will wait for all the connection to complete the current transaction and pause all of them. Sending the
RESUME
command will resume the connection in order for them to continue to send transactions to the
target Postgres instance. While connections are paused we can change the target Postgres instance achieving
a zero-downtime experience for our users.
To achieve this feature we create a PgBouncer instance (using the same image used by SGCluster for the connection pooling) that will target the read-write Service of an SGCluster. The following snippet allow to create it together with a Service that will be used by applications to connect to PgBouncer:
cat << 'EOF' | kubectl replace --force -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgbouncer
spec:
selector:
matchLabels:
app: pgbouncer
template:
metadata:
labels:
app: pgbouncer
spec:
terminationGracePeriodSeconds: 0
containers:
- name: pgbouncer
image: quay.io/ongres/pgbouncer:v1.22.1-build-6.33
command:
- sh
- /usr/local/bin/start-pgbouncer.sh
ports:
- containerPort: 5432
name: pgbouncer
protocol: TCP
volumeMounts:
- name: dynamic
mountPath: /etc/pgbouncer
- name: config
mountPath: /etc/pgbouncer/pgbouncer.ini
subPath: pgbouncer.ini
- name: config
mountPath: /usr/local/bin/start-pgbouncer.sh
subPath: start-pgbouncer.sh
volumes:
- name: dynamic
emptyDir: {}
- name: config
configMap:
defaultMode: 0444
name: pgbouncer
optional: false
---
apiVersion: v1
kind: Service
metadata:
name: pgbouncer
spec:
type: ClusterIP
selector:
app: pgbouncer
ports:
- name: pgbouncer
port: 5432
protocol: TCP
targetPort: pgbouncer
EOF
The PgBouncer instance reference a ConfigMap containing the configuration that targets the SGCluster primary Service, it also contains an initialization scripts that create the credentials for pgbouncer users and to query the Postgres instance in order to authenticate other users:
cat << 'EOF' | kubectl replace --force -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: pgbouncer
data:
pgbouncer.ini: |
[databases]
* = host=cluster port=5432
[pgbouncer]
listen_addr=0.0.0.0
listen_port=5432
pool_mode=transaction
max_client_conn=1000
default_pool_size=100
max_db_connections=0
max_user_connections=0
auth_type=md5
auth_file=/etc/pgbouncer/userlist.txt
auth_user=postgres
auth_query=SELECT usename, passwd FROM pg_shadow WHERE usename=$1
admin_users=pgbouncer_admin
stats_users=pgbouncer_stats
application_name_add_host=1
ignore_startup_parameters=extra_float_digits
server_check_query=;
start-pgbouncer.sh: |
#!/bin/sh
printf '"%s" "%s"\n' "postgres" "sup3rus3r" >> /etc/pgbouncer/userlist.txt
printf '"%s" "%s"\n' "pgbouncer_admin" "pgb0unc3r" >> /etc/pgbouncer/userlist.txt
printf '"%s" "%s"\n' "pgbouncer_stats" "pgb0unc3r" >> /etc/pgbouncer/userlist.txt
exec pgbouncer /etc/pgbouncer/pgbouncer.ini
EOF
We then create an SGCluster that has the logic to send the PAUSE
and RESUME
command thanks to the
before_stop
guard script and on_role_change
callback:
cat << 'EOF' | kubectl replace --force -f -
---
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: cluster
spec:
configurations:
patroni:
initialConfig:
postgresql:
callbacks:
on_role_change: /callbacks/on_role_change
before_stop: /callbacks/before_stop
credentials:
users:
superuser:
password:
name: credentials
key: superuser-password
instances: 2
profile: development
pods:
customVolumeMounts:
patroni:
- name: custom-callbacks
mountPath: /callbacks
customVolumes:
- name: callbacks
configMap:
defaultMode: 0775
name: callbacks
optional: false
persistentVolume:
size: 5Gi
postgres:
version: latest
---
apiVersion: v1
kind: Secret
metadata:
name: credentials
stringData:
superuser-password: sup3rus3r
EOF
The scripts are mounted in a custom volume mount from a ConfigMap. The before_stop
script is executed
by Patroni synchronously and blocks the primary instance from being stopped by a switchover or a restart
until the PAUSE command is sent to the PgBouncer instance. This allows the connection to complete the
ongoing transactions before the primary goes offline. The on_role_change
script is executed
asynchronically by Patroni and do not block the promotion of a primary. It actually waits for the instance
to be converted to primary and then sends the RESUME command so that connection sent to the instance
will be able to write to the primary:
cat << 'EOF' | kubectl replace --force -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: callbacks
data:
before_stop: |
#!/bin/sh
set -x
PATRONI_NAME="$(cat /etc/hostname)"
PATRONI_HISTORY="$(patronictl history -f tsv | tail -n +2)"
PATRONI_LIST="$(patronictl list -f tsv | tail -n +2)"
if {
[ "x$PATRONI_HISTORY" = x ] \
&& ! printf %s "$PATRONI_LIST" | grep -v $'^[^\t]\+\t'"$PATRONI_NAME"$'\t' | grep -q $'^[^\t]\+\t[^\t]\+\t[^\t]\+\tLeader\t'
} \
|| printf %s "$PATRONI_HISTORY" | grep -q $'^[^\t]\+\t[^\t]\+\t[^\t]\+\t[^\t]\+\t'"$PATRONI_NAME"'$'
then
psql postgresql://pgbouncer_admin:pgb0unc3r@pgbouncer/pgbouncer -c PAUSE
fi
exit 0
on_role_change: |
#!/bin/sh
set -x
if [ "$#" = 0 ] || [ "x$2" = xmaster ]
then
until psql -tA -c 'SELECT pg_is_in_recovery()' | grep -qxF f
do
true
done
psql postgresql://pgbouncer_admin:pgb0unc3r@pgbouncer/pgbouncer -c RESUME
psql postgresql://pgbouncer_admin:pgb0unc3r@pgbouncer/pgbouncer -tA -c 'SHOW STATE' | grep -q 'paused|no'
fi
EOF
Now, to demonstrate the effectivity of this deployment let’s create the following Job that will launch a pgbench against the PgBouncer instance:
cat << 'EOF' | kubectl replace --force -f -
apiVersion: batch/v1
kind: Job
metadata:
name: pgbench
spec:
template:
spec:
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
containers:
- name: pgbench
image: quay.io/ongres/postgres-util:v16.3-build-6.34
command:
- sh
- -c
- |
pgbench postgresql://postgres:sup3rus3r@pgbouncer/postgres -i
pgbench postgresql://postgres:sup3rus3r@pgbouncer/postgres -T 300 -c 4 -j 4 -P 2 --progress-timestamp
EOF
Wait for the Job to be started and print some progress of the benchmark, then create an restart SGDbOps that will restart the replica, perform a switchover and then will restart the primary.
cat << 'EOF' | kubectl replace --force -f -
apiVersion: stackgres.io/v1
kind: SGDbOps
metadata:
name: restart
spec:
op: restart
sgCluster: cluster
restart:
method: InPlace
EOF
After the operation is completed wait for the completion of the Job and check no errors were raised:
kubectl wait sgdbops restart --for=condition=Completed
kubectl wait job pgbench --for=condition=Completed
Check the pgbench Job’s logs and you should not be able to find any failed connection!