Skip to content

CockroachDB — Operations

Scope

Cluster deployment, topology patterns, range management, backup/recovery, and monitoring.

Deployment Patterns

Cluster Topology

Pattern Min Nodes Survival Goal Use Case
Single-region, multi-zone 3 Zone failure Standard HA
Multi-region 9 (3 per region) Region failure Global distribution
Global tables 9+ Region failure + low reads Reference data everywhere
# Start a 3-node cluster
cockroach start --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259
cockroach init --insecure --host=localhost:26257

Performance Tuning

Parameter Default Tuned Impact
kv.range.max_bytes 512MiB 128MiB-512MiB Smaller = more parallelism
sql.defaults.distsql auto auto Distributed SQL execution
kv.snapshot_rebalance.max_rate 32MiB/s 64MiB/s Faster rebalancing
server.time_until_store_dead 5m 5m Dead node detection
-- Check range distribution
SELECT range_id, replicas, lease_holder FROM crdb_internal.ranges WHERE table_name = 'orders';

-- Hotspot detection
SELECT * FROM crdb_internal.node_statement_statistics ORDER BY count DESC LIMIT 10;

Backup & Recovery

-- Full cluster backup to cloud storage
BACKUP INTO 's3://bucket/backup?AUTH=implicit' AS OF SYSTEM TIME '-10s';

-- Incremental backup
BACKUP INTO LATEST IN 's3://bucket/backup?AUTH=implicit';

-- Restore
RESTORE FROM LATEST IN 's3://bucket/backup?AUTH=implicit';

Common Issues

Issue Diagnosis Fix
Range under-replicated DB Console > Replication Add nodes, check disk space
High query latency EXPLAIN ANALYZE Add indexes, check network
Clock skew cockroach node status Configure NTP, max-offset
Node decommission stuck cockroach node status --decommission Check range rebalancing progress