Skip to content

PostgreSQL — Operations

Scope

Production deployment, replication, backup/recovery, performance tuning, and monitoring for PostgreSQL.

Deployment Patterns

Replication Topologies

Pattern Consistency Failover Use Case
Streaming (async) Eventual Manual/automatic Standard HA
Streaming (sync) Strong Automatic Zero data loss
Logical replication Eventual Manual Cross-version, selective
Patroni + etcd Strong Automatic Enterprise K8s deployments
PgBouncer + HAProxy N/A (proxy) Automatic Connection pooling

Connection Pooling

# PgBouncer recommended settings
[pgbouncer]
pool_mode = transaction         # Best for most workloads
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_lifetime = 3600
server_idle_timeout = 600

Connection Limits

PostgreSQL defaults to max_connections = 100. Each connection consumes ~10MB of memory. Use PgBouncer to multiplex thousands of application connections.

Performance Tuning

Memory Configuration

Parameter Formula Example (32Gi RAM)
shared_buffers 25% of RAM 8GB
effective_cache_size 75% of RAM 24GB
work_mem RAM / max_connections / 4 32MB
maintenance_work_mem RAM / 16 2GB
wal_buffers 3% of shared_buffers 256MB (cap at 256MB)

WAL & Checkpoint Tuning

-- Production WAL settings
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_size = '4GB';
ALTER SYSTEM SET min_wal_size = '1GB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET checkpoint_timeout = '15min';
ALTER SYSTEM SET wal_compression = 'zstd';  -- PG 15+

Query Performance

-- Enable query statistics
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Identify missing indexes
SELECT schemaname, tablename, seq_scan, seq_tup_read,
       idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan AND seq_tup_read > 10000
ORDER BY seq_tup_read DESC;

Backup & Recovery

pg_basebackup (Physical)

# Full base backup
pg_basebackup -h primary -U replication -D /backup/base \
  --checkpoint=fast --wal-method=stream -z

# Point-in-time recovery
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-04-12 10:00:00'
# Full backup
pgbackrest --stanza=main --type=full backup

# Incremental backup
pgbackrest --stanza=main --type=incr backup

# Restore to point-in-time
pgbackrest --stanza=main --type=time \
  --target="2026-04-12 10:00:00+00" restore

Monitoring

Key Metrics

-- Active connections vs limit
SELECT count(*), (SELECT setting FROM pg_settings WHERE name='max_connections')
FROM pg_stat_activity;

-- Replication lag (on replica)
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

-- Cache hit ratio (should be > 99%)
SELECT sum(blks_hit) / (sum(blks_hit) + sum(blks_read)) AS cache_hit_ratio
FROM pg_stat_database;

-- Transaction rate
SELECT xact_commit + xact_rollback AS total_txn FROM pg_stat_database WHERE datname = current_database();

Common Issues

Issue Diagnosis Fix
Connection exhausted SELECT count(*) FROM pg_stat_activity Use PgBouncer, increase max_connections
Bloated tables SELECT pg_size_pretty(pg_table_size('table')) VACUUM FULL or pg_repack
Slow queries pg_stat_statements Add indexes, rewrite query
WAL disk full SELECT pg_current_wal_lsn() Archive WAL, increase max_wal_size
Replication lag pg_stat_replication Check network, increase wal_sender_timeout