This tutorial expects that you have pgbadger installed on your machine. Check the installation procedure to get it running properly.
Before start, be sure that you have a SGCluster runing that is using a SGDistributedLogs server, like below:
---
apiVersion: stackgres.io/v1
kind: SGDistributedLogs
metadata:
  namespace: default
  name: my-distributed-logs
spec:
  persistentVolume:
    size: 20Gi
Remember to change the
sizeaccording with your needs.
To generate a pgbadger report, a few configuration parameters are necessary:
---
apiVersion: stackgres.io/v1
kind: SGPostgresConfig
metadata:
  name: my-postgres-config
  namespace: default
spec:
  postgresVersion: "12"
  postgresql.conf:
    # Logging configuration for pgbadger
    log_checkpoints: 'on'
    log_connections: 'on'
    log_disconnections: 'on'
    log_lock_waits: 'on'
    log_temp_files: '0'
    # Adjust the minimum time to collect data
    log_min_duration_statement: '5s'
    log_autovacuum_min_duration: '0'
Check pgbadger documentation for more tails about the necessary parameters to setup Postgres.
The final SGCluster should be something like this:
---
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: my-db-cluster
  namespace: default
spec:
# ...
  configurations:
    sgPostgresConfig: my-postgres-config
  distributedLogs: 
    sgDistributedLogs: my-distributed-logs
Execute the command below to locate the pod of the distributed log server:
kubectl get pods -o name -l distributed-logs-name=my-distributed-logs 
# pod/my-distributed-logs-0
Connect on the distributed server and export the log into the CSV format:
QUERY=$(cat <<EOF
COPY (
  SELECT 
    log_time, 
    user_name,
    database_name,
    process_id,
    connection_from,
    session_id,
    session_line_num,
    command_tag,
    session_start_time,
    virtual_transaction_id,
    transaction_id,
    error_severity,
    sql_state_code,
    message,
    detail,
    hint,
    internal_query,
    internal_query_pos,
    context,
    query,
    query_pos,
    "location",
    application_name 
  FROM log_postgres 
) to STDOUT CSV DELIMITER ',';
EOF
)
kubectl exec -it pod/my-distributed-logs-0 -c patroni -- psql default_my-db-cluster -At -c "${QUERY}" > data.csv
Add a
WHEREclause on theSELECTto filter the log on the necessary period, like this:--- ... WHERE log_time > 'begin timestamp' and log_time < 'end timestamp'
With the csv file, just call pgbadger:
pgbadger --format csv --outfile pgbadger_report.html data.csv
PGbadger has support to a external command to get the log info, using that is possible to create a all-in-one script to generate the pgbadger report.
POD=$(kubectl get pods -o name -l distributed-logs-name=my-distributed-logs)
CLUSTER_NAME="my-db-cluster"
QUERY=$(cat <<EOF
COPY (
  SELECT 
    log_time, 
    user_name,
    database_name,
    process_id,
    connection_from,
    session_id,
    session_line_num,
    command_tag,
    session_start_time,
    virtual_transaction_id,
    transaction_id,
    error_severity,
    sql_state_code,
    message,
    detail,
    hint,
    internal_query,
    internal_query_pos,
    context,
    query,
    query_pos,
    "location",
    application_name 
  FROM log_postgres 
) to STDOUT CSV DELIMITER ',';
EOF
)
pgbadger \
  --format csv \
  --outfile pgbadger_report.html \
  --command "kubectl exec -it ${POD} -c patroni -- psql default_${CLUSTER_NAME} -At -c \"${QUERY}\""