Skip to content

Architecture

1. Default Topology / Flow

flowchart TB
    subgraph Sources["Data Sources"]
        OTEL_S["OTel Collector\n(OTLP gRPC/HTTP)"]
        PROM_S["Prometheus\n(remote_write)"]
        ES_S["ES Bulk API\nclients"]
        FB_S["FluentBit /\nVector"]
        KF_S["Kinesis Firehose\n/ GCP Pub/Sub"]
        RUM_S["RUM SDK\n(browser)"]
    end

    subgraph O2Cluster["OpenObserve Cluster"]
        direction TB
        subgraph Stateless["Stateless Compute"]
            Router["Router\n(request dispatch)"]
            Ingester["Ingester\n(WAL → Parquet)"]
            Querier["Querier\n(DataFusion engine)"]
            Compactor["Compactor\n(file merging)"]
            AlertMgr["AlertManager\n(alerts + reports)"]
        end

        subgraph Infra["Infrastructure"]
            WAL["WAL\n(local disk\nmemtable)"]
            Cache["Disk Cache\n(querier-side)"]
        end
    end

    subgraph Storage["Storage Layer"]
        S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
        PQ["Apache Parquet\n(Zstd compressed)"]
        Meta["Metadata Store\n(PostgreSQL / SQLite)"]
    end

    Sources --> Router
    Router --> Ingester
    Ingester --> WAL
    WAL -->|"flush every\n5min or size"| PQ
    PQ --> S3
    Querier -->|"scan"| S3
    Querier --> Cache
    Compactor -->|"merge"| S3
    AlertMgr --> Querier

    style Stateless fill:#e65100,color:#fff
    style Storage fill:#1565c0,color:#fff

Component breakdown, deployment topologies, and storage architecture for OpenObserve.

System Architecture

Node Role Architecture

flowchart LR
    subgraph Roles["ZO_NODE_ROLE"]
        ALL["all\n(single node)"]
        R["router"]
        I["ingester"]
        Q["querier"]
        C["compactor"]
        A["alertmanager"]
    end

    subgraph Groups["ZO_NODE_ROLE_GROUP"]
        Default["default\n(user queries)"]
        Background["background\n(alerts, reports)"]
    end

    R --> I
    R --> Q
    R --> A
    Q -.- Default
    A -.- Background

Role Responsibility

Role State Scales CPU Profile Memory Profile
Router Stateless Horizontal Low Low
Ingester WAL on disk Horizontal Medium Medium (memtable)
Querier Cache on disk Horizontal High (DataFusion) High (scan buffers)
Compactor Stateless 1–2 nodes Medium Low
AlertManager Stateless 1–2 nodes Low Low

Storage Architecture

Data Path

sequenceDiagram
    participant Client as Client
    participant Ingester as Ingester
    participant WAL as Local WAL
    participant S3 as Object Storage
    participant Compactor as Compactor

    Client->>Ingester: JSON / OTLP / ES Bulk
    Ingester->>Ingester: Schema inference
    Ingester->>WAL: Write to memtable (Arrow batches)
    Note over WAL: Flush triggers:<br/>5 min elapsed OR<br/>file size threshold

    WAL->>S3: Write small Parquet file
    Note over S3: Small files (1-10 MB)

    loop Background compaction
        Compactor->>S3: Read small files
        Compactor->>Compactor: Sort, merge, re-partition
        Compactor->>S3: Write large Parquet file
        Compactor->>S3: Delete old small files
    end

    Note over S3: Large files (100+ MB)<br/>Sorted by time, partitioned by stream

Parquet File Structure

Layer Detail
Partitioning By organization → stream → date → time window
Compression Zstd (default), high compression ratio
Bloom filters Per-column, configurable for high-cardinality fields
Row groups Optimized for DataFusion predicate pushdown
Metadata Column statistics for partition pruning

Query Engine: DataFusion

flowchart LR
    SQL["SQL Query"] --> Parser["SQL Parser"]
    Parser --> LP["Logical Plan"]
    LP --> Opt["Optimizer\n(predicate pushdown,\nprojection pruning,\npartition pruning)"]
    Opt --> PP["Physical Plan"]
    PP --> Scan["Parquet Scanner\n(parallel, columnar)"]
    Scan --> S3_Q["Read from S3\n(only needed cols)"]
    S3_Q --> Exec["Vectorized Execution\n(Arrow batches)"]
    Exec --> Result["Query Result"]

    style Opt fill:#2e7d32,color:#fff

HA Deployment Topology

flowchart TB
    LB["Load Balancer"]

    subgraph Routers["Router Pool"]
        R1["Router 1"]
        R2["Router 2"]
    end

    subgraph Ingesters["Ingester Pool"]
        I1["Ingester 1\n(WAL /data1)"]
        I2["Ingester 2\n(WAL /data2)"]
        I3["Ingester 3\n(WAL /data3)"]
    end

    subgraph Queriers["Querier Pool"]
        Q1["Querier 1\n(cache /cache1)"]
        Q2["Querier 2\n(cache /cache2)"]
    end

    C1["Compactor"]
    A1["AlertManager"]

    S3_HA["S3 / MinIO\n(shared storage)"]
    PG["PostgreSQL\n(metadata)"]

    LB --> Routers
    R1 --> Ingesters
    R2 --> Ingesters
    R1 --> Queriers
    R2 --> Queriers
    Ingesters --> S3_HA
    Queriers --> S3_HA
    C1 --> S3_HA
    A1 --> Queriers

    Routers --> PG
    Ingesters --> PG
    Queriers --> PG

    style S3_HA fill:#1565c0,color:#fff

Sources

Data Model

1. Default Topology / Flow

erDiagram
    Openobserve_CORE ||--o{ CONFIG : requires
    Openobserve_CORE ||--o{ STATE : writes
    CONFIG {
        string runtime_params
        string limits
    }
    STATE {
        string metric_id
        json payload
    }

How It Works

How OpenObserve uses Rust, Apache Parquet, and object storage to deliver cost-efficient observability.

Architecture

Component Roles

flowchart TB
    subgraph Sources["Data Sources"]
        OTEL["OTel Collector"]
        PROM["Prometheus"]
        FB["FluentBit / Vector"]
        ES_API["ES Bulk API\nclients"]
        RUM_SDK["RUM SDK"]
        KF["Kinesis Firehose"]
    end

    subgraph O2["OpenObserve Cluster"]
        direction TB
        Router["Router\n(request dispatch)"]
        Ingester["Ingester\n(WAL → Parquet)"]
        Querier["Querier\n(DataFusion)"]
        Compactor["Compactor\n(file merging)"]
        Alert["AlertManager\n(alerts + reports)"]
    end

    subgraph Storage["Storage Layer"]
        WAL["WAL\n(local disk)"]
        S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
        Meta["Metadata Store\n(PostgreSQL / SQLite)"]
    end

    Sources --> Router
    Router --> Ingester
    Ingester -->|batch| WAL
    WAL -->|flush| S3
    Querier --> S3
    Compactor --> S3
    Alert --> Querier

    style O2 fill:#e65100,color:#fff

Data Flow: Ingestion to Query

sequenceDiagram
    participant Client as Data Source
    participant Router as Router
    participant Ingester as Ingester
    participant WAL as Local WAL
    participant S3 as Object Storage
    participant Compactor as Compactor
    participant Querier as Querier

    Client->>Router: OTLP / ES Bulk / Prom RW
    Router->>Ingester: Route by stream
    Ingester->>Ingester: Schema inference
    Ingester->>WAL: Write to memtable
    WAL->>S3: Flush as Parquet (5min or size threshold)
    Note over S3: Small Parquet files
    Compactor->>S3: Merge small files
    Compactor->>S3: Write optimized Parquet
    Note over S3: Large, sorted Parquet files
    Querier->>S3: Read Parquet partitions
    Querier->>Querier: DataFusion vectorized execution
    Querier->>Client: Return results

Storage Architecture

Parquet on Object Storage

OpenObserve stores all data as Apache Parquet files on object storage:

Aspect Detail
Format Apache Parquet (columnar)
Compression Zstd (default)
Partitioning By stream, date, and time window
Object storage S3, GCS, Azure Blob, MinIO
Local mode Disk-backed (for dev/single-node)
Bloom filters Per-column for high-cardinality field acceleration

Why 140x Cheaper Than Elasticsearch

Factor OpenObserve Elasticsearch
Compression Parquet columnar + Zstd (~10:1) Lucene segments (~1.5:1)
Storage tier S3 ($0.023/GB/mo) SSD ($0.10+/GB/mo)
Compute Stateless, scale to zero Always-on data nodes
Replicas S3 provides 11-nines Manually replicated shards
Net effect ~$0.002/GB/mo storage ~$0.28/GB/mo storage

Query Engine: Apache Arrow DataFusion

OpenObserve uses DataFusion (Apache Arrow's query engine) for SQL execution:

  1. SQL parsing: User query parsed into logical plan
  2. Optimization: Predicate pushdown, projection pruning, partition pruning
  3. Parquet scanning: Only reads needed columns and row groups from S3
  4. Vectorized execution: Arrow columnar batches processed in CPU cache-friendly patterns
  5. Aggregation: Final merge across partitions

Query Language Support

Signal Language Example
Logs SQL SELECT * FROM logs WHERE body LIKE '%error%' ORDER BY _timestamp DESC LIMIT 100
Traces SQL SELECT * FROM traces WHERE service_name='api' AND duration > 1000
Metrics PromQL rate(http_requests_total[5m])

Node Roles

In HA mode, OpenObserve separates concerns via ZO_NODE_ROLE:

Role Responsibility
router Dispatches requests to correct backend node
ingester Receives data, writes WAL, flushes Parquet to S3
querier Reads Parquet from S3, executes queries
compactor Merges small Parquet files for efficiency
alertmanager Evaluates alert rules, generates notifications

Workload Separation

ZO_NODE_ROLE_GROUP prevents resource-intensive queries (alerts, reports) from impacting real-time user searches by routing them to dedicated node groups.

Pipelines

OpenObserve supports ingestion-time pipelines for data transformation:

  • Field extraction: Parse structured data from log lines
  • Enrichment: Add fields from lookup tables
  • Filtering: Drop unwanted logs before storage
  • Routing: Send data to different streams based on content

Pipelines use VRL (Vector Remap Language) for transformation logic.

Sources


Benchmarks

Storage efficiency, query performance, and cost analysis for OpenObserve.

Storage Efficiency vs Elasticsearch

Architectural Comparison

Factor OpenObserve Elasticsearch
Storage format Parquet (columnar) + Zstd Lucene segments (row-oriented) + inverted index
Compression ratio ~10:1 ~1.5:1
Storage tier Object storage (S3: $0.023/GB/mo) SSD ($0.10+/GB/mo)
Indexing Optional per-column bloom filters Full inverted index on every field
Net storage cost ~$0.002/GB/mo ~$0.28/GB/mo
Claimed advantage ~140x cheaper storage

Why 140x Cheaper

The 140x claim combines two factors:

  1. Compression: Parquet columnar + Zstd achieves ~10:1 vs Lucene's ~1.5:1 → ~7x less raw bytes
  2. Storage tier: S3 ($0.023/GB/mo) vs SSD ($0.10+/GB/mo) → ~4–5x cheaper per GB
  3. No replica overhead: S3 provides 11-nines durability natively vs manually replicated Elasticsearch shards → ~2–3x savings

Combined: ~7x × ~4x × ~2.5x ≈ ~70–140x depending on configuration.

Caveat: This is a vendor-provided comparison. Actual ratios depend on data patterns, compression ratios, and S3 pricing tiers.

Query Performance

Analytical Queries (Aggregations)

Aspect OpenObserve Elasticsearch
Column pruning Yes (read only needed columns) No (reads full documents)
Predicate pushdown Yes (DataFusion → Parquet row group stats) Partial (inverted index)
Vectorized execution Yes (Apache Arrow batches) No
Aggregation speed Often faster for analytical patterns Faster for full-text search
Aspect OpenObserve Elasticsearch
Approach Parquet scan + bloom filters Inverted index
Wildcard search Full scan (slower) Fast (inverted index)
Best for Known-field searches, aggregations Complex full-text search

Resource Footprint

Single-Node (Dev/POC)

Metric Value
Binary size ~50 MB
Startup time < 5 seconds
Idle RAM ~50–100 MB
Minimum resources 1 CPU, 512 MB RAM

Production HA

Component CPU RAM Storage
Ingester (×3) 2 vCPU 4 GB 100 GB WAL disk
Querier (×2) 4 vCPU 8 GB 50 GB cache disk
Compactor (×1) 2 vCPU 4 GB 50 GB temp
Router (×2) 1 vCPU 1 GB
AlertManager (×1) 1 vCPU 1 GB
PostgreSQL 1 vCPU 2 GB 20 GB
S3 Unlimited

Cost Comparison (100 GB/day logs, 30-day retention)

Cost Item OpenObserve (self-hosted) Elasticsearch (self-hosted)
Storage ~$7/mo (S3, 300 GB after compression) ~$300/mo (3 TB SSD, 3× replicated)
Compute ~$500/mo (small stateless nodes) ~$1,500/mo (3× data nodes, 64 GB each)
Total ~$507/mo ~$1,800/mo

Scale Limits

Dimension Practical Limit Notes
Daily ingestion PB-scale S3 write throughput bottleneck
Query concurrency 50–100 Add querier replicas
Retention Unlimited S3 lifecycle policies
Streams (indices) 10,000+ Metadata store may need PostgreSQL
Single query scan TB-range DataFusion parallelizes across partitions

Caveats

  • 140x cost claim is from vendor benchmarks and combines compression + storage tier + replication savings.
  • Full-text search performance lags behind Elasticsearch's inverted index for wildcard/fuzzy queries.
  • DataFusion is less battle-tested than ClickHouse or Elasticsearch at extreme scale.
  • Performance varies significantly with data patterns and query types.

Sources