Skip to content

Operations

Deployment, configuration, scaling, and day-2 operations for SigNoz.

Deployment

Kubernetes (Helm) — Production

# Add Helm repo
helm repo add signoz https://charts.signoz.io
helm repo update

# Install
kubectl create namespace signoz
helm install signoz signoz/signoz --namespace signoz -f values.yaml

Prerequisites: Kubernetes >= 1.22, Helm >= 3.8

Minimum resources: 8 GB RAM, 4 vCPU, 30 GB storage
Recommended: 16 GB RAM, 8 vCPU, 80 GB storage

Docker Compose — Development

git clone https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker/clickhouse-setup
docker compose up -d
# UI: http://localhost:3301

Configuration

Retention Policies

Configure via SigNoz UI (Settings > General) or ClickHouse TTL:

Signal Default (Self-Hosted) Cloud Default
Traces 7 days 15 days
Logs 7 days 15 days
Metrics 30 days 30 days

ClickHouse Tuning

<!-- clickhouse-config.xml -->
<max_server_memory_usage_to_ram_ratio>0.7</max_server_memory_usage_to_ram_ratio>
<max_concurrent_queries>100</max_concurrent_queries>
<max_threads>16</max_threads>

Collector Scaling

For high-volume environments, use a gateway pattern:

# Load-balanced OTel Collector fleet
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  memory_limiter:
    limit_mib: 4096
    spike_limit_mib: 512
exporters:
  clickhousetraces:
    datasource: tcp://clickhouse:9000

Scaling

Component Scaling Guide

Component Scale Strategy
OTel Collectors Horizontal — add replicas behind load balancer
Query Service Horizontal — add replicas
ClickHouse Shard for write throughput, replicate for reads
Frontend Horizontal — stateless

ClickHouse Cluster Sizing

Daily Ingestion Shards Replicas Memory per Node
< 50 GB/day 1 2 16 GB
50–200 GB/day 2 2 32 GB
200+ GB/day 4+ 2 64 GB

Monitoring

Health Checks

# Kubernetes pod status
kubectl get pods -n signoz

# Query service health
curl http://localhost:8080/api/v1/health

# ClickHouse health
kubectl exec -n signoz $(kubectl get pod -n signoz -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "SELECT 1"

Upgrades

helm repo update signoz
helm upgrade signoz signoz/signoz -n signoz -f values.yaml

# Verify
kubectl get pods -n signoz

Post-Upgrade Cleanup

# Delete old PVCs if needed
kubectl -n signoz delete pvc --selector app.kubernetes.io/instance=signoz

Sources


Commands & Recipes

Runnable commands, configuration snippets, and troubleshooting recipes for SigNoz.

Installation

Helm (Kubernetes)

helm repo add signoz https://charts.signoz.io
helm repo update
kubectl create namespace signoz
helm install signoz signoz/signoz --namespace signoz

Docker Compose

git clone https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker/clickhouse-setup
docker compose up -d

Operational Recipes

Check ClickHouse Disk Usage

kubectl exec -n signoz $(kubectl get pod -n signoz -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "
    SELECT database, table,
      formatReadableSize(sum(bytes_on_disk)) as size,
      sum(rows) as rows
    FROM system.parts
    WHERE active
    GROUP BY database, table
    ORDER BY sum(bytes_on_disk) DESC"

Adjust Retention via ClickHouse TTL

# Set traces retention to 14 days
kubectl exec -n signoz $(kubectl get pod -n signoz -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "
    ALTER TABLE signoz_traces.signoz_index_v2
    MODIFY TTL timestamp + INTERVAL 14 DAY"

# Set logs retention to 30 days
kubectl exec -n signoz $(kubectl get pod -n signoz -l app=clickhouse -o name | head -1) -- \
  clickhouse-client --query "
    ALTER TABLE signoz_logs.logs
    MODIFY TTL timestamp + INTERVAL 30 DAY"

View Collector Metrics

# Check OTel Collector internal metrics
kubectl port-forward -n signoz svc/signoz-otel-collector 8888:8888
curl http://localhost:8888/metrics | grep otelcol_receiver_accepted

Restart Components

# Restart query service
kubectl rollout restart deployment signoz-query-service -n signoz

# Restart frontend
kubectl rollout restart deployment signoz-frontend -n signoz

# Restart OTel Collector
kubectl rollout restart deployment signoz-otel-collector -n signoz

Check Logs

# Query service logs
kubectl logs -n signoz -l app=signoz-query-service --tail=100

# Collector logs
kubectl logs -n signoz -l app=signoz-otel-collector --tail=100

# Docker Compose
docker compose logs -f signoz-query-service
docker compose logs -f signoz-otel-collector

Upgrade

helm repo update signoz
helm upgrade signoz signoz/signoz -n signoz -f values.yaml

# Verify
kubectl get pods -n signoz

Useful ClickHouse Queries

-- Top services by trace count (last 1h)
SELECT serviceName, count() as cnt
FROM signoz_traces.signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY serviceName
ORDER BY cnt DESC
LIMIT 20;

-- Error rate by service (last 1h)
SELECT serviceName,
  countIf(statusCode = 2) / count() as error_rate
FROM signoz_traces.signoz_index_v2
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY serviceName
HAVING count() > 100
ORDER BY error_rate DESC;

-- Log volume by severity (last 1h)
SELECT severityText, count() as cnt
FROM signoz_logs.logs
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY severityText
ORDER BY cnt DESC;

Sources