Skip to content

Ceph — Operations

Scope

Cluster deployment, CRUSH map management, pool tuning, OSD operations, and health monitoring.

Cluster Architecture

Component Role Min Count
MON Cluster state, Paxos consensus 3 (odd number)
MGR Metrics, dashboard, orchestrator 2 (active/standby)
OSD Data storage (1 per disk) 3+
MDS CephFS metadata (if using CephFS) 2+
RGW S3/Swift gateway (if using object) 2+

Deployment Methods

# Cephadm (recommended for new clusters)
cephadm bootstrap --mon-ip 10.0.0.1 --initial-dashboard-user admin

# Add hosts
ceph orch host add node2 10.0.0.2
ceph orch host add node3 10.0.0.3

# Deploy OSDs on all available devices
ceph orch apply osd --all-available-devices

Pool Management

# Create replicated pool
ceph osd pool create mypool 128 128 replicated

# Create erasure coded pool (higher storage efficiency)
ceph osd pool create ecpool 128 128 erasure

# Set replication factor
ceph osd pool set mypool size 3 min_size 2

# Enable compression
ceph osd pool set mypool compression_algorithm zstd
ceph osd pool set mypool compression_mode aggressive

CRUSH Map

# View CRUSH hierarchy
ceph osd tree

# Create failure domain rule
ceph osd crush rule create-replicated replicated_rack default rack

# Move OSD to specific host/rack
ceph osd crush set osd.5 1.0 root=default datacenter=dc1 rack=rack2 host=node5

Health & Monitoring

# Cluster health
ceph health detail
ceph status

# OSD performance
ceph osd perf
ceph osd df

# PG status
ceph pg stat
ceph pg dump_stuck unclean

# Prometheus metrics (via MGR)
ceph mgr module enable prometheus

Common Issues

Issue Diagnosis Fix
HEALTH_WARN: PGs degraded ceph health detail Wait for recovery or add OSDs
OSD down ceph osd tree Check disk, restart OSD daemon
Slow requests ceph daemon osd.X perf dump Check disk latency, network
Near-full OSDs ceph osd df Reweight, add storage, delete data
Clock skew ceph health detail Configure NTP on all nodes