SigNoz is a full-stack observability platform built natively on OpenTelemetry standards with ClickHouse as its high-performance analytical storage backend. It provides a unified interface for APM, distributed tracing, log management, metrics monitoring, alerting, and dashboarding — positioning itself as an open-source alternative to Datadog, New Relic, and Grafana Cloud.
Why it's better: First-class OpenTelemetry integration (no proprietary agents), unified single-pane-of-glass across all signals, ClickHouse delivers millisecond aggregation queries at massive scale, transparent usage-based pricing with no per-seat fees, and pioneering AI/LLM observability features.
When it fits (Applicability):
Teams already using or adopting OpenTelemetry
Organizations replacing Datadog/NewRelic for cost reduction
Engineering teams needing correlated traces → logs → metrics in one UI
AI/ML teams needing LLM observability (token tracking, cost monitoring)
Frontend teams needing RUM (Real User Monitoring) with Core Web Vitals
Pros and Cons:
Pros
Cons
OTel-native (no proprietary agents)
ClickHouse operational complexity at scale
Unified UI for all signals
Pre-1.0 version (rapid iteration, breaking changes)
What is the exact license classification? Enterprise features in ee/ directory have proprietary license — is the core truly open-source or source-available? — Open-core model. Code outside ee/ and cmd/enterprise/ is MIT-licensed (truly open source). The ee/ directory uses the SigNoz Enterprise License, which is proprietary/source-available — production use requires a paid subscription, though development/testing is free. See LICENSE and ee/LICENSE on GitHub.
How does SigNoz handle ClickHouse schema migrations during upgrades? — SigNoz uses a three-step migration process via signoz-otel-collector migrate: (1) bootstrap initializes schema, (2) sync up runs synchronous migrations, (3) async up runs asynchronous migrations. Replication-aware migrations require --clickhouse-replication=true. Schema migrations are run as a pre-upgrade step. See upgrade docs at signoz.io/docs/operate/migration/upgrade-standard.
What is the maximum practical ClickHouse cluster size SigNoz has been tested with? — SigNoz docs show distributed ClickHouse configurations with multiple shards and replicas (e.g., 2 shards x 2 replicas with 3 ZooKeeper nodes on Kubernetes). No published maximum cluster size; ClickHouse itself scales to hundreds of nodes in production deployments at other companies. SigNoz community reports of large deployments are limited. For production, use distributed ClickHouse with shards/replicas and ZooKeeper (3 nodes recommended).
How does OpAMP handle collector reconfiguration failure (rollback behavior)? — The OpAMP protocol does not mandate server-side rollback. When a new config is pushed, the agent validates it before applying. If validation fails, the agent reports a FAILED status and continues running with the last-known-good config (reported via effective_config). If the collector process crashes with a bad config, the supervisor can restart it with the previous configuration. The server detects failures via status reports and can re-send an older config. SigNoz uses OpAMP via opamp.yaml with a WebSocket endpoint to its query service.
Can SigNoz ingest Prometheus recording rules directly? — Not directly. SigNoz's migration docs state that Prometheus recording rules are "not directly supported but can be managed using scheduled queries or ClickHouse materialization." SigNoz supports PromQL for querying and alerting, and can receive Prometheus metrics via the prometheus receiver or prometheusremotewrite receiver in the OTel collector. Pre-computed recording rules need to be re-implemented as SigNoz alert rules or ClickHouse materialized views.