Skip to content

OpenObserve

Open-source, cloud-native observability platform written in Rust with Apache Parquet columnar storage and S3-native architecture — positioned as a 140x cheaper alternative to Elasticsearch.

Overview

OpenObserve (O2) is a high-performance observability platform built in Rust that provides unified logs, metrics, traces, and Real User Monitoring (RUM). Its architecture is built around Apache Parquet columnar storage on object storage (S3/GCS/Azure Blob), eliminating the need for local SSD clusters. It uses the Apache Arrow DataFusion query engine and supports SQL for logs/traces and PromQL for metrics.

Repository & Community

Attribute Detail
Repository github.com/openobserve/openobserve
Stars ~18.5k ⭐
Latest Version v0.70.3 stable / v0.80.0-rc3 pre-release (April 2026)
Language Rust
License AGPL-3.0 (copyleft)
Company OpenObserve, Inc.
Contributors ~109

Evaluation

  • Why it's better: Written in Rust (no GC, no JVM), Parquet columnar storage on S3 delivers 140x lower storage costs vs Elasticsearch, single-binary deployment, SQL as the primary query language (lower learning curve), stateless node architecture for effortless horizontal scaling, and built-in RBAC/SSO for enterprise readiness.

  • When it fits (Applicability):

  • Elasticsearch/ELK replacement (logs-heavy workloads)
  • Cost-conscious organizations with petabyte-scale log volumes
  • Teams wanting SQL-based observability queries
  • Cloud-native environments with S3/GCS object storage
  • Organizations needing RBAC/SSO out of the box

  • Pros and Cons:

Pros Cons
Rust: no GC pauses, excellent performance AGPL-3.0 license (copyleft restrictions)
140x cheaper storage vs Elasticsearch Pre-1.0 version, still maturing
Single binary deployment Smaller community than Grafana/SigNoz
SQL for logs/traces (low learning curve) No eBPF auto-instrumentation
S3-native (infinite storage) Less mature trace analysis than Tempo/Jaeger
Built-in RBAC and SSO Limited PromQL (metrics only)
Stateless horizontal scaling Fewer integrations/dashboards than Grafana
RUM / Frontend monitoring DataFusion query engine less battle-tested

Architecture

flowchart TB
    subgraph Ingestion["Data Sources"]
        OTEL_O["OTel Collector"]
        PROM_O["Prometheus"]
        FB_O["FluentBit /<br/>Vector"]
        SDK_O["RUM SDK"]
    end

    subgraph O2["OpenObserve Cluster"]
        direction TB
        Router["Router<br/>(request dispatch)"]
        Ingester["Ingester<br/>(→ Parquet)"]
        Querier["Querier<br/>(DataFusion)"]
        Compactor["Compactor<br/>(file merging)"]
        AlertMgr["AlertManager"]
    end

    subgraph Storage_O["Storage"]
        S3["S3 / GCS /<br/>Azure Blob / MinIO"]
        Parquet["Apache Parquet<br/>columnar format"]
    end

    Ingestion --> Router --> Ingester
    Ingester --> Parquet --> S3
    Querier --> S3
    Compactor --> S3
    AlertMgr --> Querier

    style O2 fill:#e65100,color:#fff
    style Storage_O fill:#1565c0,color:#fff

Key Components

Component Role
Router Dispatches incoming requests to appropriate component
Ingester Receives telemetry, converts to Parquet, writes to object storage
Querier Apache Arrow DataFusion engine: SQL/PromQL queries on Parquet
Compactor Merges small Parquet files for query efficiency
AlertManager Manages alert queries, report jobs, notifications

Key Features

Feature Detail
Unified Logs, Metrics, Traces All signals in one platform
Real User Monitoring (RUM) Core Web Vitals, page load, client errors
SQL Queries Standard SQL for logs and traces
PromQL Prometheus-compatible metrics queries
Dashboards Drag-and-drop, customizable panels
Alerting Multi-signal alerts, notification pipelines
Pipelines Ingestion-time data transformation
RBAC & SSO Built-in enterprise access controls
Compliance ISO 27001, SOC 2, GDPR ready
Single Binary Trivial deployment for POC/dev

Pricing

Tier Cost Notes
Self-Hosted (Free) $0 (AGPL-3.0) Full features, manage your own infra
Cloud - Developer Free 200 GB ingestion, 15 day retention
Cloud - Pro Usage-based (~$0.60/GB) Logs, metrics, traces
Enterprise Custom Commercial license, SLA, audit logs

Compatibility

Dimension Support
Ingestion protocols OTLP (gRPC + HTTP), Prometheus remote_write, ES Bulk API, Kinesis Firehose, GCP Pub/Sub
Query languages SQL (logs/traces), PromQL (metrics)
Storage backends S3, GCS, Azure Blob, MinIO (Parquet format)
Deployment Single binary, Docker, Kubernetes (Helm), OpenObserve Cloud
CPU architecture amd64, arm64
Platforms Linux, macOS, Windows

Sources

Tracked URLs, documentation references, and source materials for the OpenObserve folder.

Official Documentation

Source URL Retrieved Via
Official Website https://openobserve.ai Direct
Documentation https://openobserve.ai/docs/ Direct
Architecture https://openobserve.ai/docs/architecture/ Direct
HA Deployment https://openobserve.ai/docs/ha_deployment/ Direct
Environment Variables https://openobserve.ai/docs/environment-variables/ Direct
API Reference https://openobserve.ai/docs/api/ Direct
Pricing https://openobserve.ai/pricing Direct

Repositories

Repository URL
OpenObserve (main) https://github.com/openobserve/openobserve
O2 CLI https://github.com/openobserve/o2-cli
Helm Charts https://charts.openobserve.ai

Release Data

Source URL Retrieved Via
GitHub Releases https://github.com/openobserve/openobserve/releases Web Search

Community

Source URL
Blog https://openobserve.ai/blog/
Slack https://short.openobserve.ai/community
LinkedIn https://www.linkedin.com/company/openobserve

Questions

Open and answered questions about OpenObserve.

Open Questions

Answered Questions

  • What is the maximum tested ingestion rate in a real production environment? — Official docs state OpenObserve handles hundreds of thousands of events per second per node, with 7-30 MB/sec ingestion per vCPU core. Very-high-speed ingestion (using ZO_FEATURE_PER_THREAD_LOCK=true and OTLP gRPC) can boost throughput 60-100% each, for a combined ~4x improvement. No single published maximum, but multi-node clusters target millions of events/sec. See observability/openobserve/operations.
  • How does DataFusion performance compare to ClickHouse for the same analytical queries? — No direct published benchmark exists. OpenObserve uses Apache DataFusion (Arrow-based) for SQL query execution with SIMD acceleration, broadcast join optimization (up to 99.9% data transfer reduction), and query partitioning for progressive results. ClickHouse has its own vectorized engine optimized for aggregations on columnar storage. Comparative benchmarks vs Elasticsearch pending — OpenObserve claims 140x lower storage cost; independent verification against ClickHouse-based alternatives (e.g., SigNoz, Qryn) not yet published as of April 2026.
  • What is the AGPL-3.0 impact for SaaS providers embedding OpenObserve? — AGPL-3.0 requires any modified version used to provide a network service to have its full source code made available to users. SaaS providers embedding OpenObserve as a component must either (a) release their modifications under AGPL-3.0, (b) use OpenObserve as a separate service behind an API boundary (arguably not a "derivative work"), or (c) contact OpenObserve Inc. for a commercial license exception. Legal review is strongly recommended before embedding in proprietary SaaS products.
  • How does compaction performance scale with 10,000+ streams? — Each stream is compacted independently; at 10,000+ streams the compactor must manage many small Parquet files concurrently. OpenObserve merges files up to 256 MB and checks every 10 seconds. High stream counts increase compaction CPU and memory overhead and can exhaust file descriptors. No published benchmark at this scale; monitor ZO_COMPACT settings and file handle limits. Community reports on GitHub issues suggest this is an active area of optimization.
  • Does OpenObserve support federated querying across multiple S3 buckets? — Yes, via Super Cluster mode. Multiple OpenObserve clusters (each backed by its own S3 bucket) are joined into a super cluster. A leader cluster dispatches queries to worker clusters via gRPC, each processes locally, and results are merged. Metadata (schemas, dashboards, alerts) is synchronized across clusters. This enables cross-region and cross-bucket federated search. See observability/openobserve/architecture.
  • What is the WAL recovery behavior when an ingester crashes mid-flush? — Incoming data is written to per-org/stream-type WAL files (paired with Memtables). When a Memtable reaches 256 MB or WAL reaches 128 MB, it becomes immutable and is flushed to local Parquet, then uploaded to object storage every 10 seconds. If an ingester crashes mid-flush, the WAL is replayed on restart to reconstruct unflushed Memtables. If the WAL disk itself is lost, data is unrecoverable unless replication factor > 1 (which writes to multiple ingesters). Partially uploaded Parquet files are handled by the immutable-to-object-storage pipeline. See observability/openobserve/architecture.
  • What is the storage format? → Apache Parquet with Zstd compression on S3/GCS/Azure. See observability/openobserve/architecture.
  • Is OpenObserve a drop-in Elasticsearch replacement? → Partially — supports ES Bulk API for ingestion, but query API differs (SQL vs Lucene DSL).
  • How does the 140x cost claim hold up? → Combines compression (~7x), storage tier (~4-5x), and replication savings (~2-3x). See observability/openobserve/architecture#Benchmarks.
  • What query languages are supported? → SQL for logs/traces, PromQL for metrics. See observability/openobserve/index#Key Features.
  • Can OpenObserve run as a single binary? → Yes, for dev/POC. Set ZO_LOCAL_MODE=true. See observability/openobserve/operations.