This tutorial expects that you have pgBadger installed on your machine. Check the installation procedure to get it running properly.
Before you start, be sure that you have an StackGres cluster running that is configured to use a distributed logs server, like below:
---
apiVersion: stackgres.io/v1
kind: SGDistributedLogs
metadata:
namespace: default
name: my-distributed-logs
spec:
persistentVolume:
size: 20Gi
Change the
size
according to your needs.
To generate a pgBadger report, a few configuration parameters are necessary:
---
apiVersion: stackgres.io/v1
kind: SGPostgresConfig
metadata:
name: my-postgres-config
namespace: default
spec:
postgresVersion: "12"
postgresql.conf:
# Logging configuration for pgbadger
log_checkpoints: 'on'
log_connections: 'on'
log_disconnections: 'on'
log_lock_waits: 'on'
log_temp_files: '0'
# Adjust the minimum time to collect data
log_min_duration_statement: '5s'
log_autovacuum_min_duration: '0'
Check the pgBadger documentation for more tails about the necessary parameters to set up Postgres.
The final StackGres cluster should be something like this:
---
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: my-db-cluster
namespace: default
spec:
# ...
configurations:
sgPostgresConfig: my-postgres-config
distributedLogs:
sgDistributedLogs: my-distributed-logs
Execute the command below to locate the pod of the distributed log server:
$ kubectl get pods -o name -l stackgres.io/distributed-logs-name=my-distributed-logs
pod/my-distributed-logs-0
Connect to the distributed server and export the logs into CSV format:
QUERY=$(cat <<EOF
COPY (
SELECT
log_time,
user_name,
database_name,
process_id,
connection_from,
session_id,
session_line_num,
command_tag,
session_start_time,
virtual_transaction_id,
transaction_id,
error_severity,
sql_state_code,
message,
detail,
hint,
internal_query,
internal_query_pos,
context,
query,
query_pos,
"location",
application_name
FROM log_postgres
) to STDOUT CSV DELIMITER ',';
EOF
)
kubectl exec -it pod/my-distributed-logs-0 -c patroni -- psql default_my-db-cluster -At -c "${QUERY}" > data.csv
Add a
WHERE
clause on theSELECT
to filter the logs on the necessary period, like this:--- ... WHERE log_time > 'begin timestamp' and log_time < 'end timestamp'
With the CSV file, invoke pgBadger:
pgbadger --format csv --outfile pgbadger_report.html data.csv
PgBadger supports defining an external command to get the log info. Using that, it is possible to create an all-in-one script that generates the pgBadger report.
POD=$(kubectl get pods -o name -l stackgres.io/distributed-logs-name=my-distributed-logs)
CLUSTER_NAME="my-db-cluster"
QUERY=$(cat <<EOF
COPY (
SELECT
log_time,
user_name,
database_name,
process_id,
connection_from,
session_id,
session_line_num,
command_tag,
session_start_time,
virtual_transaction_id,
transaction_id,
error_severity,
sql_state_code,
message,
detail,
hint,
internal_query,
internal_query_pos,
context,
query,
query_pos,
"location",
application_name
FROM log_postgres
) to STDOUT CSV DELIMITER ',';
EOF
)
pgbadger \
--format csv \
--outfile pgbadger_report.html \
--command "kubectl exec -it ${POD} -c patroni -- psql default_${CLUSTER_NAME} -At -c \"${QUERY}\""