CockroachDB — Operations
Scope
Cluster deployment, topology patterns, range management, backup/recovery, and monitoring.
Deployment Patterns
Cluster Topology
| Pattern |
Min Nodes |
Survival Goal |
Use Case |
| Single-region, multi-zone |
3 |
Zone failure |
Standard HA |
| Multi-region |
9 (3 per region) |
Region failure |
Global distribution |
| Global tables |
9+ |
Region failure + low reads |
Reference data everywhere |
# Start a 3-node cluster
cockroach start --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259
cockroach init --insecure --host=localhost:26257
| Parameter |
Default |
Tuned |
Impact |
kv.range.max_bytes |
512MiB |
128MiB-512MiB |
Smaller = more parallelism |
sql.defaults.distsql |
auto |
auto |
Distributed SQL execution |
kv.snapshot_rebalance.max_rate |
32MiB/s |
64MiB/s |
Faster rebalancing |
server.time_until_store_dead |
5m |
5m |
Dead node detection |
-- Check range distribution
SELECT range_id, replicas, lease_holder FROM crdb_internal.ranges WHERE table_name = 'orders';
-- Hotspot detection
SELECT * FROM crdb_internal.node_statement_statistics ORDER BY count DESC LIMIT 10;
Backup & Recovery
-- Full cluster backup to cloud storage
BACKUP INTO 's3://bucket/backup?AUTH=implicit' AS OF SYSTEM TIME '-10s';
-- Incremental backup
BACKUP INTO LATEST IN 's3://bucket/backup?AUTH=implicit';
-- Restore
RESTORE FROM LATEST IN 's3://bucket/backup?AUTH=implicit';
Common Issues
| Issue |
Diagnosis |
Fix |
| Range under-replicated |
DB Console > Replication |
Add nodes, check disk space |
| High query latency |
EXPLAIN ANALYZE |
Add indexes, check network |
| Clock skew |
cockroach node status |
Configure NTP, max-offset |
| Node decommission stuck |
cockroach node status --decommission |
Check range rebalancing progress |