Architecture¶

1. Default Topology / Flow¶

flowchart TB
    subgraph Sources["Data Sources"]
        OTEL_S["OTel Collector\n(OTLP gRPC/HTTP)"]
        PROM_S["Prometheus\n(remote_write)"]
        ES_S["ES Bulk API\nclients"]
        FB_S["FluentBit /\nVector"]
        KF_S["Kinesis Firehose\n/ GCP Pub/Sub"]
        RUM_S["RUM SDK\n(browser)"]
    end

    subgraph O2Cluster["OpenObserve Cluster"]
        direction TB
        subgraph Stateless["Stateless Compute"]
            Router["Router\n(request dispatch)"]
            Ingester["Ingester\n(WAL → Parquet)"]
            Querier["Querier\n(DataFusion engine)"]
            Compactor["Compactor\n(file merging)"]
            AlertMgr["AlertManager\n(alerts + reports)"]
        end

        subgraph Infra["Infrastructure"]
            WAL["WAL\n(local disk\nmemtable)"]
            Cache["Disk Cache\n(querier-side)"]
        end
    end

    subgraph Storage["Storage Layer"]
        S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
        PQ["Apache Parquet\n(Zstd compressed)"]
        Meta["Metadata Store\n(PostgreSQL / SQLite)"]
    end

    Sources --> Router
    Router --> Ingester
    Ingester --> WAL
    WAL -->|"flush every\n5min or size"| PQ
    PQ --> S3
    Querier -->|"scan"| S3
    Querier --> Cache
    Compactor -->|"merge"| S3
    AlertMgr --> Querier

    style Stateless fill:#e65100,color:#fff
    style Storage fill:#1565c0,color:#fff

Component breakdown, deployment topologies, and storage architecture for OpenObserve.

System Architecture¶

Node Role Architecture¶

flowchart LR
    subgraph Roles["ZO_NODE_ROLE"]
        ALL["all\n(single node)"]
        R["router"]
        I["ingester"]
        Q["querier"]
        C["compactor"]
        A["alertmanager"]
    end

    subgraph Groups["ZO_NODE_ROLE_GROUP"]
        Default["default\n(user queries)"]
        Background["background\n(alerts, reports)"]
    end

    R --> I
    R --> Q
    R --> A
    Q -.- Default
    A -.- Background

Role Responsibility¶

Role	State	Scales	CPU Profile	Memory Profile
Router	Stateless	Horizontal	Low	Low
Ingester	WAL on disk	Horizontal	Medium	Medium (memtable)
Querier	Cache on disk	Horizontal	High (DataFusion)	High (scan buffers)
Compactor	Stateless	1–2 nodes	Medium	Low
AlertManager	Stateless	1–2 nodes	Low	Low

Storage Architecture¶

Data Path¶

sequenceDiagram
    participant Client as Client
    participant Ingester as Ingester
    participant WAL as Local WAL
    participant S3 as Object Storage
    participant Compactor as Compactor

    Client->>Ingester: JSON / OTLP / ES Bulk
    Ingester->>Ingester: Schema inference
    Ingester->>WAL: Write to memtable (Arrow batches)
    Note over WAL: Flush triggers:<br/>5 min elapsed OR<br/>file size threshold

    WAL->>S3: Write small Parquet file
    Note over S3: Small files (1-10 MB)

    loop Background compaction
        Compactor->>S3: Read small files
        Compactor->>Compactor: Sort, merge, re-partition
        Compactor->>S3: Write large Parquet file
        Compactor->>S3: Delete old small files
    end

    Note over S3: Large files (100+ MB)<br/>Sorted by time, partitioned by stream

Parquet File Structure¶

Layer	Detail
Partitioning	By organization → stream → date → time window
Compression	Zstd (default), high compression ratio
Bloom filters	Per-column, configurable for high-cardinality fields
Row groups	Optimized for DataFusion predicate pushdown
Metadata	Column statistics for partition pruning

Query Engine: DataFusion¶

flowchart LR
    SQL["SQL Query"] --> Parser["SQL Parser"]
    Parser --> LP["Logical Plan"]
    LP --> Opt["Optimizer\n(predicate pushdown,\nprojection pruning,\npartition pruning)"]
    Opt --> PP["Physical Plan"]
    PP --> Scan["Parquet Scanner\n(parallel, columnar)"]
    Scan --> S3_Q["Read from S3\n(only needed cols)"]
    S3_Q --> Exec["Vectorized Execution\n(Arrow batches)"]
    Exec --> Result["Query Result"]

    style Opt fill:#2e7d32,color:#fff

HA Deployment Topology¶

flowchart TB
    LB["Load Balancer"]

    subgraph Routers["Router Pool"]
        R1["Router 1"]
        R2["Router 2"]
    end

    subgraph Ingesters["Ingester Pool"]
        I1["Ingester 1\n(WAL /data1)"]
        I2["Ingester 2\n(WAL /data2)"]
        I3["Ingester 3\n(WAL /data3)"]
    end

    subgraph Queriers["Querier Pool"]
        Q1["Querier 1\n(cache /cache1)"]
        Q2["Querier 2\n(cache /cache2)"]
    end

    C1["Compactor"]
    A1["AlertManager"]

    S3_HA["S3 / MinIO\n(shared storage)"]
    PG["PostgreSQL\n(metadata)"]

    LB --> Routers
    R1 --> Ingesters
    R2 --> Ingesters
    R1 --> Queriers
    R2 --> Queriers
    Ingesters --> S3_HA
    Queriers --> S3_HA
    C1 --> S3_HA
    A1 --> Queriers

    Routers --> PG
    Ingesters --> PG
    Queriers --> PG

    style S3_HA fill:#1565c0,color:#fff

Sources¶

Data Model¶

1. Default Topology / Flow¶

erDiagram
    Openobserve_CORE ||--o{ CONFIG : requires
    Openobserve_CORE ||--o{ STATE : writes
    CONFIG {
        string runtime_params
        string limits
    }
    STATE {
        string metric_id
        json payload
    }

How It Works¶

How OpenObserve uses Rust, Apache Parquet, and object storage to deliver cost-efficient observability.

Architecture¶

Component Roles¶

flowchart TB
    subgraph Sources["Data Sources"]
        OTEL["OTel Collector"]
        PROM["Prometheus"]
        FB["FluentBit / Vector"]
        ES_API["ES Bulk API\nclients"]
        RUM_SDK["RUM SDK"]
        KF["Kinesis Firehose"]
    end

    subgraph O2["OpenObserve Cluster"]
        direction TB
        Router["Router\n(request dispatch)"]
        Ingester["Ingester\n(WAL → Parquet)"]
        Querier["Querier\n(DataFusion)"]
        Compactor["Compactor\n(file merging)"]
        Alert["AlertManager\n(alerts + reports)"]
    end

    subgraph Storage["Storage Layer"]
        WAL["WAL\n(local disk)"]
        S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
        Meta["Metadata Store\n(PostgreSQL / SQLite)"]
    end

    Sources --> Router
    Router --> Ingester
    Ingester -->|batch| WAL
    WAL -->|flush| S3
    Querier --> S3
    Compactor --> S3
    Alert --> Querier

    style O2 fill:#e65100,color:#fff

Data Flow: Ingestion to Query¶

sequenceDiagram
    participant Client as Data Source
    participant Router as Router
    participant Ingester as Ingester
    participant WAL as Local WAL
    participant S3 as Object Storage
    participant Compactor as Compactor
    participant Querier as Querier

    Client->>Router: OTLP / ES Bulk / Prom RW
    Router->>Ingester: Route by stream
    Ingester->>Ingester: Schema inference
    Ingester->>WAL: Write to memtable
    WAL->>S3: Flush as Parquet (5min or size threshold)
    Note over S3: Small Parquet files
    Compactor->>S3: Merge small files
    Compactor->>S3: Write optimized Parquet
    Note over S3: Large, sorted Parquet files
    Querier->>S3: Read Parquet partitions
    Querier->>Querier: DataFusion vectorized execution
    Querier->>Client: Return results

Storage Architecture¶

Parquet on Object Storage¶

OpenObserve stores all data as Apache Parquet files on object storage:

Aspect	Detail
Format	Apache Parquet (columnar)
Compression	Zstd (default)
Partitioning	By stream, date, and time window
Object storage	S3, GCS, Azure Blob, MinIO
Local mode	Disk-backed (for dev/single-node)
Bloom filters	Per-column for high-cardinality field acceleration

Why 140x Cheaper Than Elasticsearch¶

Factor	OpenObserve	Elasticsearch
Compression	Parquet columnar + Zstd (~10:1)	Lucene segments (~1.5:1)
Storage tier	S3 ($0.023/GB/mo)	SSD ($0.10+/GB/mo)
Compute	Stateless, scale to zero	Always-on data nodes
Replicas	S3 provides 11-nines	Manually replicated shards
Net effect	~$0.002/GB/mo storage	~$0.28/GB/mo storage

Query Engine: Apache Arrow DataFusion¶

OpenObserve uses DataFusion (Apache Arrow's query engine) for SQL execution:

SQL parsing: User query parsed into logical plan
Optimization: Predicate pushdown, projection pruning, partition pruning
Parquet scanning: Only reads needed columns and row groups from S3
Vectorized execution: Arrow columnar batches processed in CPU cache-friendly patterns
Aggregation: Final merge across partitions

Query Language Support¶

Signal	Language	Example
Logs	SQL	`SELECT * FROM logs WHERE body LIKE '%error%' ORDER BY _timestamp DESC LIMIT 100`
Traces	SQL	`SELECT * FROM traces WHERE service_name='api' AND duration > 1000`
Metrics	PromQL	`rate(http_requests_total[5m])`

Node Roles¶

In HA mode, OpenObserve separates concerns via ZO_NODE_ROLE:

Role	Responsibility
`router`	Dispatches requests to correct backend node
`ingester`	Receives data, writes WAL, flushes Parquet to S3
`querier`	Reads Parquet from S3, executes queries
`compactor`	Merges small Parquet files for efficiency
`alertmanager`	Evaluates alert rules, generates notifications

Workload Separation¶

ZO_NODE_ROLE_GROUP prevents resource-intensive queries (alerts, reports) from impacting real-time user searches by routing them to dedicated node groups.

Pipelines¶

OpenObserve supports ingestion-time pipelines for data transformation:

Field extraction: Parse structured data from log lines
Enrichment: Add fields from lookup tables
Filtering: Drop unwanted logs before storage
Routing: Send data to different streams based on content

Pipelines use VRL (Vector Remap Language) for transformation logic.

Sources¶

Benchmarks¶

Storage efficiency, query performance, and cost analysis for OpenObserve.

Storage Efficiency vs Elasticsearch¶

Architectural Comparison¶

Factor	OpenObserve	Elasticsearch
Storage format	Parquet (columnar) + Zstd	Lucene segments (row-oriented) + inverted index
Compression ratio	~10:1	~1.5:1
Storage tier	Object storage (S3: $0.023/GB/mo)	SSD ($0.10+/GB/mo)
Indexing	Optional per-column bloom filters	Full inverted index on every field
Net storage cost	~$0.002/GB/mo	~$0.28/GB/mo
Claimed advantage	~140x cheaper storage	—

Why 140x Cheaper¶

The 140x claim combines two factors:

Compression: Parquet columnar + Zstd achieves ~10:1 vs Lucene's ~1.5:1 → ~7x less raw bytes
Storage tier: S3 ($0.023/GB/mo) vs SSD ($0.10+/GB/mo) → ~4–5x cheaper per GB
No replica overhead: S3 provides 11-nines durability natively vs manually replicated Elasticsearch shards → ~2–3x savings

Combined: ~7x × ~4x × ~2.5x ≈ ~70–140x depending on configuration.

Caveat: This is a vendor-provided comparison. Actual ratios depend on data patterns, compression ratios, and S3 pricing tiers.

Query Performance¶

Analytical Queries (Aggregations)¶

Aspect	OpenObserve	Elasticsearch
Column pruning	Yes (read only needed columns)	No (reads full documents)
Predicate pushdown	Yes (DataFusion → Parquet row group stats)	Partial (inverted index)
Vectorized execution	Yes (Apache Arrow batches)	No
Aggregation speed	Often faster for analytical patterns	Faster for full-text search

Full-Text Search¶

Aspect	OpenObserve	Elasticsearch
Approach	Parquet scan + bloom filters	Inverted index
Wildcard search	Full scan (slower)	Fast (inverted index)
Best for	Known-field searches, aggregations	Complex full-text search

Resource Footprint¶

Single-Node (Dev/POC)¶

Metric	Value
Binary size	~50 MB
Startup time	< 5 seconds
Idle RAM	~50–100 MB
Minimum resources	1 CPU, 512 MB RAM

Production HA¶

Component	CPU	RAM	Storage
Ingester (×3)	2 vCPU	4 GB	100 GB WAL disk
Querier (×2)	4 vCPU	8 GB	50 GB cache disk
Compactor (×1)	2 vCPU	4 GB	50 GB temp
Router (×2)	1 vCPU	1 GB	—
AlertManager (×1)	1 vCPU	1 GB	—
PostgreSQL	1 vCPU	2 GB	20 GB
S3	—	—	Unlimited

Cost Comparison (100 GB/day logs, 30-day retention)¶

Cost Item	OpenObserve (self-hosted)	Elasticsearch (self-hosted)
Storage	~$7/mo (S3, 300 GB after compression)	~$300/mo (3 TB SSD, 3× replicated)
Compute	~$500/mo (small stateless nodes)	~$1,500/mo (3× data nodes, 64 GB each)
Total	~$507/mo	~$1,800/mo

Scale Limits¶

Dimension	Practical Limit	Notes
Daily ingestion	PB-scale	S3 write throughput bottleneck
Query concurrency	50–100	Add querier replicas
Retention	Unlimited	S3 lifecycle policies
Streams (indices)	10,000+	Metadata store may need PostgreSQL
Single query scan	TB-range	DataFusion parallelizes across partitions

Caveats¶

140x cost claim is from vendor benchmarks and combines compression + storage tier + replication savings.
Full-text search performance lags behind Elasticsearch's inverted index for wildcard/fuzzy queries.
DataFusion is less battle-tested than ClickHouse or Elasticsearch at extreme scale.
Performance varies significantly with data patterns and query types.