This guide covers common issues encountered with SGStream and their solutions.
# Get detailed status
kubectl get sgstream my-stream -o yaml
# Check conditions
kubectl get sgstream my-stream -o jsonpath='{.status.conditions}' | jq
# Check failure message
kubectl get sgstream my-stream -o jsonpath='{.status.failure}'
# Find stream pod
kubectl get pods -l stackgres.io/stream-name=my-stream
# Describe pod for events
kubectl describe pod -l stackgres.io/stream-name=my-stream
# Check logs
kubectl logs -l stackgres.io/stream-name=my-stream --tail=100
kubectl get events --field-selector involvedObject.name=my-stream --sort-by='.lastTimestamp'
Stream pod is in CrashLoopBackOff or Error state.
1. Source database not accessible
# Check connectivity from cluster
kubectl run test-connection --rm -it --image=postgres:16 -- \
psql -h source-cluster -U postgres -c "SELECT 1"
Solution: Verify network policies, service names, and credentials.
2. Invalid credentials
# Verify secret exists
kubectl get secret stream-credentials
# Check secret contents
kubectl get secret stream-credentials -o jsonpath='{.data.password}' | base64 -d
Solution: Update the secret with correct credentials.
3. Logical replication not enabled
# Check wal_level on source
kubectl exec source-cluster-0 -c postgres-util -- psql -c "SHOW wal_level"
Solution: For external PostgreSQL, set wal_level = logical and restart.
4. Insufficient replication slots
# Check max_replication_slots
kubectl exec source-cluster-0 -c postgres-util -- psql -c "SHOW max_replication_slots"
# Check current slots
kubectl exec source-cluster-0 -c postgres-util -- psql -c "SELECT * FROM pg_replication_slots"
Solution: Increase max_replication_slots in PostgreSQL configuration.
Error: replication slot "xxx" already exists
kubectl get sgstream --all-namespaces
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"SELECT pg_drop_replication_slot('orphaned_slot_name')"
spec:
source:
sgCluster:
debeziumProperties:
slotName: unique_slot_name
Error: publication "xxx" already exists
spec:
source:
sgCluster:
debeziumProperties:
publicationName: existing_publication
publicationAutocreateMode: disabled
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"DROP PUBLICATION orphaned_publication"
milliSecondsBehindSource keeps increasing.
1. Target can’t keep up
Increase batch size and tune connection pool:
spec:
target:
sgCluster:
debeziumProperties:
batchSize: 1000
connectionPoolMax_size: 64
useReductionBuffer: true
2. Network latency
Check network between source and target:
kubectl exec stream-pod -- ping target-cluster
3. Insufficient resources
Increase stream pod resources:
spec:
pods:
resources:
requests:
cpu: 2000m
memory: 2Gi
limits:
cpu: 4000m
memory: 4Gi
4. Large transactions
For bulk operations, consider:
spec:
source:
sgCluster:
debeziumProperties:
maxBatchSize: 8192
maxQueueSize: 32768
Source database running out of disk space due to WAL accumulation.
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes
FROM pg_replication_slots"
kubectl delete pod -l stackgres.io/stream-name=my-stream
spec:
source:
sgCluster:
debeziumProperties:
heartbeatIntervalMs: 30000
# Only if stream can be recreated
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"SELECT pg_drop_replication_slot('stuck_slot')"
Snapshot phase runs for extended periods.
spec:
source:
sgCluster:
debeziumProperties:
snapshotMaxThreads: 4
snapshotFetchSize: 20000
spec:
source:
sgCluster:
includes:
- "public\\.important_table"
debeziumProperties:
snapshotIncludeCollectionList:
- "public\\.important_table"
spec:
source:
sgCluster:
debeziumProperties:
snapshotMode: no_data # Skip initial snapshot
Then trigger incremental snapshots via signals.
Errors about unsupported or mismatched data types.
spec:
source:
sgCluster:
debeziumProperties:
includeUnknownDatatypes: true
binaryHandlingMode: base64
spec:
source:
sgCluster:
debeziumProperties:
converters:
geometry:
type: io.debezium.connector.postgresql.converters.GeometryConverter
Events not being delivered to CloudEvent endpoint.
kubectl run curl --rm -it --image=curlimages/curl -- \
curl -v https://events.example.com/health
spec:
target:
cloudEvent:
http:
skipHostnameVerification: true # For self-signed certs
spec:
target:
cloudEvent:
http:
connectTimeout: "30s"
readTimeout: "60s"
retryLimit: 10
Stream pod restarts frequently.
kubectl describe pod -l stackgres.io/stream-name=my-stream | grep -A5 "Last State"
Solution: Increase memory limits.
Enable retries:
spec:
source:
sgCluster:
debeziumProperties:
errorsMaxRetries: 10
retriableRestartConnectorWaitMs: 30000
Check PVC status:
kubectl get pvc -l stackgres.io/stream-name=my-stream
SGStream stuck in Terminating state.
kubectl get sgstream my-stream -o jsonpath='{.metadata.finalizers}'
kubectl patch sgstream my-stream -p '{"metadata":{"finalizers":null}}' --type=merge
# Delete replication slot manually
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"SELECT pg_drop_replication_slot('my_stream_slot')"
# Delete publication
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"DROP PUBLICATION IF EXISTS my_stream_publication"
To stop a stream gracefully and clean up resources:
kubectl annotate sgstream my-stream \
debezium-signal.stackgres.io/tombstone='{}'
kubectl get sgstream my-stream -w
kubectl delete sgstream my-stream
Enable verbose logging for detailed troubleshooting:
spec:
pods:
customContainers:
- name: stream
env:
- name: DEBUG_STREAM
value: "true"
- name: QUARKUS_LOG_LEVEL
value: "DEBUG"
If issues persist:
# Stream status
kubectl get sgstream my-stream -o yaml > stream-status.yaml
# Pod logs
kubectl logs -l stackgres.io/stream-name=my-stream --tail=500 > stream-logs.txt
# Events
kubectl get events --field-selector involvedObject.name=my-stream > stream-events.txt
# Source database status
kubectl exec source-cluster-0 -c postgres-util -- psql -c \
"SELECT * FROM pg_replication_slots" > replication-slots.txt