PostgreSQL — Operations¶
Scope
Production deployment, replication, backup/recovery, performance tuning, and monitoring for PostgreSQL.
Deployment Patterns¶
Replication Topologies¶
| Pattern | Consistency | Failover | Use Case |
|---|---|---|---|
| Streaming (async) | Eventual | Manual/automatic | Standard HA |
| Streaming (sync) | Strong | Automatic | Zero data loss |
| Logical replication | Eventual | Manual | Cross-version, selective |
| Patroni + etcd | Strong | Automatic | Enterprise K8s deployments |
| PgBouncer + HAProxy | N/A (proxy) | Automatic | Connection pooling |
Connection Pooling¶
# PgBouncer recommended settings
[pgbouncer]
pool_mode = transaction # Best for most workloads
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_lifetime = 3600
server_idle_timeout = 600
Connection Limits
PostgreSQL defaults to max_connections = 100. Each connection consumes ~10MB of memory. Use PgBouncer to multiplex thousands of application connections.
Performance Tuning¶
Memory Configuration¶
| Parameter | Formula | Example (32Gi RAM) |
|---|---|---|
shared_buffers |
25% of RAM | 8GB |
effective_cache_size |
75% of RAM | 24GB |
work_mem |
RAM / max_connections / 4 | 32MB |
maintenance_work_mem |
RAM / 16 | 2GB |
wal_buffers |
3% of shared_buffers | 256MB (cap at 256MB) |
WAL & Checkpoint Tuning¶
-- Production WAL settings
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_size = '4GB';
ALTER SYSTEM SET min_wal_size = '1GB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET checkpoint_timeout = '15min';
ALTER SYSTEM SET wal_compression = 'zstd'; -- PG 15+
Query Performance¶
-- Enable query statistics
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Identify missing indexes
SELECT schemaname, tablename, seq_scan, seq_tup_read,
idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan AND seq_tup_read > 10000
ORDER BY seq_tup_read DESC;
Backup & Recovery¶
pg_basebackup (Physical)¶
# Full base backup
pg_basebackup -h primary -U replication -D /backup/base \
--checkpoint=fast --wal-method=stream -z
# Point-in-time recovery
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2026-04-12 10:00:00'
pgBackRest (Recommended)¶
# Full backup
pgbackrest --stanza=main --type=full backup
# Incremental backup
pgbackrest --stanza=main --type=incr backup
# Restore to point-in-time
pgbackrest --stanza=main --type=time \
--target="2026-04-12 10:00:00+00" restore
Monitoring¶
Key Metrics¶
-- Active connections vs limit
SELECT count(*), (SELECT setting FROM pg_settings WHERE name='max_connections')
FROM pg_stat_activity;
-- Replication lag (on replica)
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
-- Cache hit ratio (should be > 99%)
SELECT sum(blks_hit) / (sum(blks_hit) + sum(blks_read)) AS cache_hit_ratio
FROM pg_stat_database;
-- Transaction rate
SELECT xact_commit + xact_rollback AS total_txn FROM pg_stat_database WHERE datname = current_database();
Common Issues¶
| Issue | Diagnosis | Fix |
|---|---|---|
| Connection exhausted | SELECT count(*) FROM pg_stat_activity |
Use PgBouncer, increase max_connections |
| Bloated tables | SELECT pg_size_pretty(pg_table_size('table')) |
VACUUM FULL or pg_repack |
| Slow queries | pg_stat_statements |
Add indexes, rewrite query |
| WAL disk full | SELECT pg_current_wal_lsn() |
Archive WAL, increase max_wal_size |
| Replication lag | pg_stat_replication |
Check network, increase wal_sender_timeout |