Ceph — Operations¶
Scope
Cluster deployment, CRUSH map management, pool tuning, OSD operations, and health monitoring.
Cluster Architecture¶
| Component | Role | Min Count |
|---|---|---|
| MON | Cluster state, Paxos consensus | 3 (odd number) |
| MGR | Metrics, dashboard, orchestrator | 2 (active/standby) |
| OSD | Data storage (1 per disk) | 3+ |
| MDS | CephFS metadata (if using CephFS) | 2+ |
| RGW | S3/Swift gateway (if using object) | 2+ |
Deployment Methods¶
# Cephadm (recommended for new clusters)
cephadm bootstrap --mon-ip 10.0.0.1 --initial-dashboard-user admin
# Add hosts
ceph orch host add node2 10.0.0.2
ceph orch host add node3 10.0.0.3
# Deploy OSDs on all available devices
ceph orch apply osd --all-available-devices
Pool Management¶
# Create replicated pool
ceph osd pool create mypool 128 128 replicated
# Create erasure coded pool (higher storage efficiency)
ceph osd pool create ecpool 128 128 erasure
# Set replication factor
ceph osd pool set mypool size 3 min_size 2
# Enable compression
ceph osd pool set mypool compression_algorithm zstd
ceph osd pool set mypool compression_mode aggressive
CRUSH Map¶
# View CRUSH hierarchy
ceph osd tree
# Create failure domain rule
ceph osd crush rule create-replicated replicated_rack default rack
# Move OSD to specific host/rack
ceph osd crush set osd.5 1.0 root=default datacenter=dc1 rack=rack2 host=node5
Health & Monitoring¶
# Cluster health
ceph health detail
ceph status
# OSD performance
ceph osd perf
ceph osd df
# PG status
ceph pg stat
ceph pg dump_stuck unclean
# Prometheus metrics (via MGR)
ceph mgr module enable prometheus
Common Issues¶
| Issue | Diagnosis | Fix |
|---|---|---|
| HEALTH_WARN: PGs degraded | ceph health detail |
Wait for recovery or add OSDs |
| OSD down | ceph osd tree |
Check disk, restart OSD daemon |
| Slow requests | ceph daemon osd.X perf dump |
Check disk latency, network |
| Near-full OSDs | ceph osd df |
Reweight, add storage, delete data |
| Clock skew | ceph health detail |
Configure NTP on all nodes |