This tutorial expects that you have pgbadger installed on your machine. Check the installation procedure to get it running properly.
Before start, be sure that you have a SGCluster
runing that is using a SGDistributedLogs
server, like below:
---
apiVersion: stackgres.io/v1beta1
kind: SGDistributedLogs
metadata:
namespace: default
name: my-distributed-logs
spec:
persistentVolume:
size: 20Gi
Remember to change the
size
according with your needs.
To generate a pgbadger report, a few configuration parameters are necessary:
---
apiVersion: stackgres.io/v1beta1
kind: SGPostgresConfig
metadata:
name: my-postgres-config
namespace: default
spec:
postgresVersion: "12"
postgresql.conf:
# Logging configuration for pgbadger
log_checkpoints: on
log_connections: on
log_disconnections: on
log_lock_waits: on
log_temp_files: 0
# Adjust the minimum time to collect data
log_min_duration_statement: '5s'
log_autovacuum_min_duration: 0
Check pgbadger documentation for more tails about the necessary parameters to setup Postgres.
The final SGCluster
should be something like this:
---
apiVersion: stackgres.io/v1beta1
kind: SGCluster
metadata:
name: my-db-cluster
namespace: default
spec:
# ...
configurations:
sgPostgresConfig: my-postgres-config
distributedLogs:
sgDistributedLogs: my-distributed-logs
Execute the command below to locate the pod of the distributed log server:
kubectl get pods -o name -l distributed-logs-name=my-distributed-logs
# pod/my-distributed-logs-0
Connect on the distributed server and export the log into the CSV format:
QUERY=$(cat <<EOF
COPY (
SELECT
log_time,
user_name,
database_name,
process_id,
connection_from,
session_id,
session_line_num,
command_tag,
session_start_time,
virtual_transaction_id,
transaction_id,
error_severity,
sql_state_code,
message,
detail,
hint,
internal_query,
internal_query_pos,
context,
query,
query_pos,
"location",
application_name
FROM log_postgres
) to STDOUT CSV DELIMITER ',';
EOF
)
kubectl exec -it pod/my-distributed-logs-0 -c patroni -- psql default_my-db-cluster -At -c "${QUERY}" > data.csv
Add a
WHERE
clause on theSELECT
to filter the log on the necessary period, like this:--- ... WHERE log_time > 'begin timestamp' and log_time < 'end timestamp'
With the csv file, just call pgbadger:
pgbadger --format csv --outfile pgbadger_report.html data.csv
PGbadger has support to a external command to get the log info, using that is possible to create a all-in-one script to generate the pgbadger report.
POD=$(kubectl get pods -o name -l distributed-logs-name=my-distributed-logs)
CLUSTER_NAME="my-db-cluster"
QUERY=$(cat <<EOF
COPY (
SELECT
log_time,
user_name,
database_name,
process_id,
connection_from,
session_id,
session_line_num,
command_tag,
session_start_time,
virtual_transaction_id,
transaction_id,
error_severity,
sql_state_code,
message,
detail,
hint,
internal_query,
internal_query_pos,
context,
query,
query_pos,
"location",
application_name
FROM log_postgres
) to STDOUT CSV DELIMITER ',';
EOF
)
pgbadger \
--format csv \
--outfile pgbadger_report.html \
--command "kubectl exec -it ${POD} -c patroni -- psql default_${CLUSTER_NAME} -At -c \"${QUERY}\""