Skip to content

Operations

Deployment, configuration, scaling, and day-2 operations for Coroot.

Deployment

The Coroot Operator manages the lifecycle of all Coroot components via Custom Resources:

# Add Helm repo
helm repo add coroot https://coroot.github.io/helm-charts
helm repo update coroot

# Install the Operator
helm install -n coroot --create-namespace \
  coroot-operator coroot/coroot-operator

# Deploy Community Edition
helm install -n coroot coroot coroot/coroot-ce \
  --set "clickhouse.shards=2,clickhouse.replicas=2"

Docker Compose (Development)

git clone https://github.com/coroot/coroot.git
cd coroot
docker compose up -d

Multi-Cluster Setup

For multi-cluster observability, deploy the main Coroot instance in a centralized "observability" cluster. In remote clusters, deploy the operator in agentsOnly mode:

# Remote cluster — agents only
helm install -n coroot --create-namespace \
  coroot-operator coroot/coroot-operator

helm install -n coroot coroot coroot/coroot-ce \
  --set "agentsOnly=true" \
  --set "agents.server.url=https://central-coroot.example.com"

Configuration

Custom Resource (CR) Configuration

Parameter Default Description
metricsRefreshInterval 15s Prometheus query resolution
cacheTTL 720h Metric cache retention
clickhouse.shards 1 ClickHouse shard count
clickhouse.replicas 1 ClickHouse replica count
registry.url Docker Hub Custom container registry
registry.pullSecret Image pull secret name

Private Registry Configuration

helm install -n coroot coroot coroot/coroot-ce \
  --set registry.url=https://registry.YOUR_DOMAIN/coroot \
  --set registry.pullSecret=coroot-registry-auth

Node Agent Configuration

Flag Description
--cgroupfs-root Override cgroup filesystem path
--listen Agent listen address (default: :80)
--wal-dir Write-ahead log directory
ENABLE_JAVA_TLS Enable Java TLS profiling support

Scaling

Vertical Scaling

  • Coroot Server: Increase CPU/memory for larger service maps and more concurrent inspections
  • ClickHouse: Scale storage and memory based on log/trace retention

Horizontal Scaling

  • ClickHouse sharding: Distribute data across multiple shards for write throughput
  • ClickHouse replication: Add replicas for read performance and HA

Storage Tuning

Coroot v1.18.6+ supports S3 storage for ClickHouse, eliminating local disk requirements:

# ClickHouse S3 storage configuration
clickhouse:
  storage:
    type: s3
    s3:
      endpoint: https://s3.amazonaws.com
      bucket: coroot-clickhouse
      accessKeyId: AKIAIOSFODNN7EXAMPLE
      secretAccessKey: wJalrXUtnFEMI/K7MDENG

Monitoring

Health Checks

# Check Coroot CR status
kubectl get coroot -n coroot

# Check all pods
kubectl get pods -n coroot

# Port-forward to access UI
kubectl port-forward -n coroot service/coroot-coroot 8080:8080

Key Metrics to Watch

Metric Alert Threshold Description
ClickHouse disk usage > 80% Risk of storage exhaustion
ClickHouse memory > 70% RAM Query performance degradation
Node agent restarts > 3/hour eBPF loading issues
Inspection queue depth > 1000 Server overloaded

Upgrades

# Upgrade operator
helm repo update coroot
helm upgrade -n coroot coroot-operator coroot/coroot-operator

# Upgrade Coroot CE
helm upgrade -n coroot coroot coroot/coroot-ce

Backup & Restore

Backup is handled through the storage backends:

  • Prometheus/VM metrics: Use vmbackup / Thanos / Mimir backup
  • ClickHouse: Use clickhouse-backup tool

Sources


Commands & Recipes

Runnable commands, configuration snippets, and troubleshooting recipes for Coroot.

Installation

Helm Install (Kubernetes)

# Add repo and install operator
helm repo add coroot https://coroot.github.io/helm-charts
helm repo update coroot
helm install -n coroot --create-namespace coroot-operator coroot/coroot-operator

# Install Community Edition
helm install -n coroot coroot coroot/coroot-ce

# Install with custom ClickHouse sizing
helm install -n coroot coroot coroot/coroot-ce \
  --set "clickhouse.shards=2,clickhouse.replicas=2" \
  --set "clickhouse.resources.requests.memory=4Gi"

Docker Compose

git clone https://github.com/coroot/coroot.git && cd coroot
docker compose up -d
# UI: http://localhost:8080

Access & Verification

# Port-forward to UI
kubectl port-forward -n coroot service/coroot-coroot 8080:8080

# Check operator status
kubectl get coroot -n coroot

# List all Coroot pods
kubectl get pods -n coroot -o wide

# Check node-agent logs
kubectl logs -n coroot -l app=coroot-node-agent --tail=50

# Check cluster-agent logs
kubectl logs -n coroot -l app=coroot-cluster-agent --tail=50

Configuration Recipes

Enable TLS for Coroot Server (v1.19.0+)

helm upgrade -n coroot coroot coroot/coroot-ce \
  --set "server.tls.enabled=true" \
  --set "server.tls.certFile=/etc/coroot/tls/tls.crt" \
  --set "server.tls.keyFile=/etc/coroot/tls/tls.key"

Private Registry

helm install -n coroot coroot coroot/coroot-ce \
  --set registry.url=https://registry.YOUR_DOMAIN/coroot \
  --set registry.pullSecret=coroot-registry-auth

Multi-Cluster (Remote Agents Only)

# On remote cluster
helm install -n coroot --create-namespace coroot-operator coroot/coroot-operator
helm install -n coroot coroot coroot/coroot-ce \
  --set "agentsOnly=true" \
  --set "agents.server.url=https://central-coroot.example.com"

Troubleshooting

eBPF Agent Not Collecting Data

# Check kernel version (needs 4.16+)
uname -r

# Check if node-agent has privileged access
kubectl get pod -n coroot -l app=coroot-node-agent -o jsonpath='{.items[0].spec.containers[0].securityContext}'

# Check node-agent eBPF loading errors
kubectl logs -n coroot -l app=coroot-node-agent | grep -i "error\|failed\|ebpf"

# Verify cgroup mount
kubectl exec -n coroot $(kubectl get pod -n coroot -l app=coroot-node-agent -o name | head -1) -- ls /sys/fs/cgroup/

ClickHouse Storage Full

# Check ClickHouse disk usage
kubectl exec -n coroot $(kubectl get pod -n coroot -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "SELECT formatReadableSize(sum(bytes_on_disk)) FROM system.parts"

# Check retention settings
kubectl exec -n coroot $(kubectl get pod -n coroot -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "SELECT database, table, engine_full FROM system.tables WHERE database LIKE 'coroot%'"

# Force merge old parts
kubectl exec -n coroot $(kubectl get pod -n coroot -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "OPTIMIZE TABLE coroot_traces.spans FINAL"

Upgrade

helm repo update coroot
helm upgrade -n coroot coroot-operator coroot/coroot-operator
helm upgrade -n coroot coroot coroot/coroot-ce

# Verify
kubectl get pods -n coroot
kubectl get coroot -n coroot

Sources