Architecture
1. Default Topology / Flow
flowchart TB
subgraph Sources["Data Sources"]
OTEL_S["OTel Collector\n(OTLP gRPC/HTTP)"]
PROM_S["Prometheus\n(remote_write)"]
ES_S["ES Bulk API\nclients"]
FB_S["FluentBit /\nVector"]
KF_S["Kinesis Firehose\n/ GCP Pub/Sub"]
RUM_S["RUM SDK\n(browser)"]
end
subgraph O2Cluster["OpenObserve Cluster"]
direction TB
subgraph Stateless["Stateless Compute"]
Router["Router\n(request dispatch)"]
Ingester["Ingester\n(WAL → Parquet)"]
Querier["Querier\n(DataFusion engine)"]
Compactor["Compactor\n(file merging)"]
AlertMgr["AlertManager\n(alerts + reports)"]
end
subgraph Infra["Infrastructure"]
WAL["WAL\n(local disk\nmemtable)"]
Cache["Disk Cache\n(querier-side)"]
end
end
subgraph Storage["Storage Layer"]
S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
PQ["Apache Parquet\n(Zstd compressed)"]
Meta["Metadata Store\n(PostgreSQL / SQLite)"]
end
Sources --> Router
Router --> Ingester
Ingester --> WAL
WAL -->|"flush every\n5min or size"| PQ
PQ --> S3
Querier -->|"scan"| S3
Querier --> Cache
Compactor -->|"merge"| S3
AlertMgr --> Querier
style Stateless fill:#e65100,color:#fff
style Storage fill:#1565c0,color:#fff
Component breakdown, deployment topologies, and storage architecture for OpenObserve.
System Architecture
Node Role Architecture
flowchart LR
subgraph Roles["ZO_NODE_ROLE"]
ALL["all\n(single node)"]
R["router"]
I["ingester"]
Q["querier"]
C["compactor"]
A["alertmanager"]
end
subgraph Groups["ZO_NODE_ROLE_GROUP"]
Default["default\n(user queries)"]
Background["background\n(alerts, reports)"]
end
R --> I
R --> Q
R --> A
Q -.- Default
A -.- Background
Role Responsibility
| Role |
State |
Scales |
CPU Profile |
Memory Profile |
| Router |
Stateless |
Horizontal |
Low |
Low |
| Ingester |
WAL on disk |
Horizontal |
Medium |
Medium (memtable) |
| Querier |
Cache on disk |
Horizontal |
High (DataFusion) |
High (scan buffers) |
| Compactor |
Stateless |
1–2 nodes |
Medium |
Low |
| AlertManager |
Stateless |
1–2 nodes |
Low |
Low |
Storage Architecture
Data Path
sequenceDiagram
participant Client as Client
participant Ingester as Ingester
participant WAL as Local WAL
participant S3 as Object Storage
participant Compactor as Compactor
Client->>Ingester: JSON / OTLP / ES Bulk
Ingester->>Ingester: Schema inference
Ingester->>WAL: Write to memtable (Arrow batches)
Note over WAL: Flush triggers:<br/>5 min elapsed OR<br/>file size threshold
WAL->>S3: Write small Parquet file
Note over S3: Small files (1-10 MB)
loop Background compaction
Compactor->>S3: Read small files
Compactor->>Compactor: Sort, merge, re-partition
Compactor->>S3: Write large Parquet file
Compactor->>S3: Delete old small files
end
Note over S3: Large files (100+ MB)<br/>Sorted by time, partitioned by stream
Parquet File Structure
| Layer |
Detail |
| Partitioning |
By organization → stream → date → time window |
| Compression |
Zstd (default), high compression ratio |
| Bloom filters |
Per-column, configurable for high-cardinality fields |
| Row groups |
Optimized for DataFusion predicate pushdown |
| Metadata |
Column statistics for partition pruning |
Query Engine: DataFusion
flowchart LR
SQL["SQL Query"] --> Parser["SQL Parser"]
Parser --> LP["Logical Plan"]
LP --> Opt["Optimizer\n(predicate pushdown,\nprojection pruning,\npartition pruning)"]
Opt --> PP["Physical Plan"]
PP --> Scan["Parquet Scanner\n(parallel, columnar)"]
Scan --> S3_Q["Read from S3\n(only needed cols)"]
S3_Q --> Exec["Vectorized Execution\n(Arrow batches)"]
Exec --> Result["Query Result"]
style Opt fill:#2e7d32,color:#fff
HA Deployment Topology
flowchart TB
LB["Load Balancer"]
subgraph Routers["Router Pool"]
R1["Router 1"]
R2["Router 2"]
end
subgraph Ingesters["Ingester Pool"]
I1["Ingester 1\n(WAL /data1)"]
I2["Ingester 2\n(WAL /data2)"]
I3["Ingester 3\n(WAL /data3)"]
end
subgraph Queriers["Querier Pool"]
Q1["Querier 1\n(cache /cache1)"]
Q2["Querier 2\n(cache /cache2)"]
end
C1["Compactor"]
A1["AlertManager"]
S3_HA["S3 / MinIO\n(shared storage)"]
PG["PostgreSQL\n(metadata)"]
LB --> Routers
R1 --> Ingesters
R2 --> Ingesters
R1 --> Queriers
R2 --> Queriers
Ingesters --> S3_HA
Queriers --> S3_HA
C1 --> S3_HA
A1 --> Queriers
Routers --> PG
Ingesters --> PG
Queriers --> PG
style S3_HA fill:#1565c0,color:#fff
Sources
Data Model
1. Default Topology / Flow
erDiagram
Openobserve_CORE ||--o{ CONFIG : requires
Openobserve_CORE ||--o{ STATE : writes
CONFIG {
string runtime_params
string limits
}
STATE {
string metric_id
json payload
}
How It Works
How OpenObserve uses Rust, Apache Parquet, and object storage to deliver cost-efficient observability.
Architecture
Component Roles
flowchart TB
subgraph Sources["Data Sources"]
OTEL["OTel Collector"]
PROM["Prometheus"]
FB["FluentBit / Vector"]
ES_API["ES Bulk API\nclients"]
RUM_SDK["RUM SDK"]
KF["Kinesis Firehose"]
end
subgraph O2["OpenObserve Cluster"]
direction TB
Router["Router\n(request dispatch)"]
Ingester["Ingester\n(WAL → Parquet)"]
Querier["Querier\n(DataFusion)"]
Compactor["Compactor\n(file merging)"]
Alert["AlertManager\n(alerts + reports)"]
end
subgraph Storage["Storage Layer"]
WAL["WAL\n(local disk)"]
S3["Object Storage\n(S3/GCS/Azure/MinIO)"]
Meta["Metadata Store\n(PostgreSQL / SQLite)"]
end
Sources --> Router
Router --> Ingester
Ingester -->|batch| WAL
WAL -->|flush| S3
Querier --> S3
Compactor --> S3
Alert --> Querier
style O2 fill:#e65100,color:#fff
Data Flow: Ingestion to Query
sequenceDiagram
participant Client as Data Source
participant Router as Router
participant Ingester as Ingester
participant WAL as Local WAL
participant S3 as Object Storage
participant Compactor as Compactor
participant Querier as Querier
Client->>Router: OTLP / ES Bulk / Prom RW
Router->>Ingester: Route by stream
Ingester->>Ingester: Schema inference
Ingester->>WAL: Write to memtable
WAL->>S3: Flush as Parquet (5min or size threshold)
Note over S3: Small Parquet files
Compactor->>S3: Merge small files
Compactor->>S3: Write optimized Parquet
Note over S3: Large, sorted Parquet files
Querier->>S3: Read Parquet partitions
Querier->>Querier: DataFusion vectorized execution
Querier->>Client: Return results
Storage Architecture
Parquet on Object Storage
OpenObserve stores all data as Apache Parquet files on object storage:
| Aspect |
Detail |
| Format |
Apache Parquet (columnar) |
| Compression |
Zstd (default) |
| Partitioning |
By stream, date, and time window |
| Object storage |
S3, GCS, Azure Blob, MinIO |
| Local mode |
Disk-backed (for dev/single-node) |
| Bloom filters |
Per-column for high-cardinality field acceleration |
Why 140x Cheaper Than Elasticsearch
| Factor |
OpenObserve |
Elasticsearch |
| Compression |
Parquet columnar + Zstd (~10:1) |
Lucene segments (~1.5:1) |
| Storage tier |
S3 ($0.023/GB/mo) |
SSD ($0.10+/GB/mo) |
| Compute |
Stateless, scale to zero |
Always-on data nodes |
| Replicas |
S3 provides 11-nines |
Manually replicated shards |
| Net effect |
~$0.002/GB/mo storage |
~$0.28/GB/mo storage |
Query Engine: Apache Arrow DataFusion
OpenObserve uses DataFusion (Apache Arrow's query engine) for SQL execution:
- SQL parsing: User query parsed into logical plan
- Optimization: Predicate pushdown, projection pruning, partition pruning
- Parquet scanning: Only reads needed columns and row groups from S3
- Vectorized execution: Arrow columnar batches processed in CPU cache-friendly patterns
- Aggregation: Final merge across partitions
Query Language Support
| Signal |
Language |
Example |
| Logs |
SQL |
SELECT * FROM logs WHERE body LIKE '%error%' ORDER BY _timestamp DESC LIMIT 100 |
| Traces |
SQL |
SELECT * FROM traces WHERE service_name='api' AND duration > 1000 |
| Metrics |
PromQL |
rate(http_requests_total[5m]) |
Node Roles
In HA mode, OpenObserve separates concerns via ZO_NODE_ROLE:
| Role |
Responsibility |
router |
Dispatches requests to correct backend node |
ingester |
Receives data, writes WAL, flushes Parquet to S3 |
querier |
Reads Parquet from S3, executes queries |
compactor |
Merges small Parquet files for efficiency |
alertmanager |
Evaluates alert rules, generates notifications |
Workload Separation
ZO_NODE_ROLE_GROUP prevents resource-intensive queries (alerts, reports) from impacting real-time user searches by routing them to dedicated node groups.
Pipelines
OpenObserve supports ingestion-time pipelines for data transformation:
- Field extraction: Parse structured data from log lines
- Enrichment: Add fields from lookup tables
- Filtering: Drop unwanted logs before storage
- Routing: Send data to different streams based on content
Pipelines use VRL (Vector Remap Language) for transformation logic.
Sources
Benchmarks
Storage efficiency, query performance, and cost analysis for OpenObserve.
Storage Efficiency vs Elasticsearch
Architectural Comparison
| Factor |
OpenObserve |
Elasticsearch |
| Storage format |
Parquet (columnar) + Zstd |
Lucene segments (row-oriented) + inverted index |
| Compression ratio |
~10:1 |
~1.5:1 |
| Storage tier |
Object storage (S3: $0.023/GB/mo) |
SSD ($0.10+/GB/mo) |
| Indexing |
Optional per-column bloom filters |
Full inverted index on every field |
| Net storage cost |
~$0.002/GB/mo |
~$0.28/GB/mo |
| Claimed advantage |
~140x cheaper storage |
— |
Why 140x Cheaper
The 140x claim combines two factors:
- Compression: Parquet columnar + Zstd achieves ~10:1 vs Lucene's ~1.5:1 → ~7x less raw bytes
- Storage tier: S3 ($0.023/GB/mo) vs SSD ($0.10+/GB/mo) → ~4–5x cheaper per GB
- No replica overhead: S3 provides 11-nines durability natively vs manually replicated Elasticsearch shards → ~2–3x savings
Combined: ~7x × ~4x × ~2.5x ≈ ~70–140x depending on configuration.
Caveat: This is a vendor-provided comparison. Actual ratios depend on data patterns, compression ratios, and S3 pricing tiers.
Analytical Queries (Aggregations)
| Aspect |
OpenObserve |
Elasticsearch |
| Column pruning |
Yes (read only needed columns) |
No (reads full documents) |
| Predicate pushdown |
Yes (DataFusion → Parquet row group stats) |
Partial (inverted index) |
| Vectorized execution |
Yes (Apache Arrow batches) |
No |
| Aggregation speed |
Often faster for analytical patterns |
Faster for full-text search |
Full-Text Search
| Aspect |
OpenObserve |
Elasticsearch |
| Approach |
Parquet scan + bloom filters |
Inverted index |
| Wildcard search |
Full scan (slower) |
Fast (inverted index) |
| Best for |
Known-field searches, aggregations |
Complex full-text search |
Single-Node (Dev/POC)
| Metric |
Value |
| Binary size |
~50 MB |
| Startup time |
< 5 seconds |
| Idle RAM |
~50–100 MB |
| Minimum resources |
1 CPU, 512 MB RAM |
Production HA
| Component |
CPU |
RAM |
Storage |
| Ingester (×3) |
2 vCPU |
4 GB |
100 GB WAL disk |
| Querier (×2) |
4 vCPU |
8 GB |
50 GB cache disk |
| Compactor (×1) |
2 vCPU |
4 GB |
50 GB temp |
| Router (×2) |
1 vCPU |
1 GB |
— |
| AlertManager (×1) |
1 vCPU |
1 GB |
— |
| PostgreSQL |
1 vCPU |
2 GB |
20 GB |
| S3 |
— |
— |
Unlimited |
Cost Comparison (100 GB/day logs, 30-day retention)
| Cost Item |
OpenObserve (self-hosted) |
Elasticsearch (self-hosted) |
| Storage |
~$7/mo (S3, 300 GB after compression) |
~$300/mo (3 TB SSD, 3× replicated) |
| Compute |
~$500/mo (small stateless nodes) |
~$1,500/mo (3× data nodes, 64 GB each) |
| Total |
~$507/mo |
~$1,800/mo |
Scale Limits
| Dimension |
Practical Limit |
Notes |
| Daily ingestion |
PB-scale |
S3 write throughput bottleneck |
| Query concurrency |
50–100 |
Add querier replicas |
| Retention |
Unlimited |
S3 lifecycle policies |
| Streams (indices) |
10,000+ |
Metadata store may need PostgreSQL |
| Single query scan |
TB-range |
DataFusion parallelizes across partitions |
Caveats
- 140x cost claim is from vendor benchmarks and combines compression + storage tier + replication savings.
- Full-text search performance lags behind Elasticsearch's inverted index for wildcard/fuzzy queries.
- DataFusion is less battle-tested than ClickHouse or Elasticsearch at extreme scale.
- Performance varies significantly with data patterns and query types.
Sources