CockroachDB — Operations¶

Scope

Cluster deployment, topology patterns, range management, backup/recovery, and monitoring.

Deployment Patterns¶

Cluster Topology¶

Pattern	Min Nodes	Survival Goal	Use Case
Single-region, multi-zone	3	Zone failure	Standard HA
Multi-region	9 (3 per region)	Region failure	Global distribution
Global tables	9+	Region failure + low reads	Reference data everywhere

# Start a 3-node cluster
cockroach start --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259
cockroach init --insecure --host=localhost:26257

Performance Tuning¶

Parameter	Default	Tuned	Impact
`kv.range.max_bytes`	512MiB	128MiB-512MiB	Smaller = more parallelism
`sql.defaults.distsql`	auto	auto	Distributed SQL execution
`kv.snapshot_rebalance.max_rate`	32MiB/s	64MiB/s	Faster rebalancing
`server.time_until_store_dead`	5m	5m	Dead node detection

-- Check range distribution
SELECT range_id, replicas, lease_holder FROM crdb_internal.ranges WHERE table_name = 'orders';

-- Hotspot detection
SELECT * FROM crdb_internal.node_statement_statistics ORDER BY count DESC LIMIT 10;

Backup & Recovery¶

-- Full cluster backup to cloud storage
BACKUP INTO 's3://bucket/backup?AUTH=implicit' AS OF SYSTEM TIME '-10s';

-- Incremental backup
BACKUP INTO LATEST IN 's3://bucket/backup?AUTH=implicit';

-- Restore
RESTORE FROM LATEST IN 's3://bucket/backup?AUTH=implicit';

Common Issues¶

Issue	Diagnosis	Fix
Range under-replicated	DB Console > Replication	Add nodes, check disk space
High query latency	`EXPLAIN ANALYZE`	Add indexes, check network
Clock skew	`cockroach node status`	Configure NTP, max-offset
Node decommission stuck	`cockroach node status --decommission`	Check range rebalancing progress