OpenObserve — Benchmarks¶

Storage efficiency, query performance, and cost analysis for OpenObserve.

Storage Efficiency vs Elasticsearch¶

Factor	OpenObserve	Elasticsearch
Storage format	Parquet (columnar) + Zstd	Lucene segments (row-oriented) + inverted index
Compression ratio	~10:1	~1.5:1
Storage tier	Object storage (S3: $0.023/GB/mo)	SSD ($0.10+/GB/mo)
Indexing	Optional per-column bloom filters	Full inverted index on every field
Net storage cost	~$0.002/GB/mo	~$0.28/GB/mo
Claimed advantage	~140x cheaper storage	—

The 140x claim combines two factors:

Compression: Parquet columnar + Zstd achieves ~10:1 vs Lucene's ~1.5:1 → ~7x less raw bytes
Storage tier: S3 ($0.023/GB/mo) vs SSD ($0.10+/GB/mo) → ~4–5x cheaper per GB
No replica overhead: S3 provides 11-nines durability natively vs manually replicated Elasticsearch shards → ~2–3x savings

Combined: ~7x × ~4x × ~2.5x ≈ ~70–140x depending on configuration.

Caveat: This is a vendor-provided comparison. Actual ratios depend on data patterns, compression ratios, and S3 pricing tiers.

Aspect	OpenObserve	Elasticsearch
Column pruning	Yes (read only needed columns)	No (reads full documents)
Predicate pushdown	Yes (DataFusion → Parquet row group stats)	Partial (inverted index)
Vectorized execution	Yes (Apache Arrow batches)	No
Aggregation speed	Often faster for analytical patterns	Faster for full-text search

Aspect	OpenObserve	Elasticsearch
Approach	Parquet scan + bloom filters	Inverted index
Wildcard search	Full scan (slower)	Fast (inverted index)
Best for	Known-field searches, aggregations	Complex full-text search

Component	CPU	RAM	Storage
Ingester (×3)	2 vCPU	4 GB	100 GB WAL disk
Querier (×2)	4 vCPU	8 GB	50 GB cache disk
Compactor (×1)	2 vCPU	4 GB	50 GB temp
Router (×2)	1 vCPU	1 GB	—
AlertManager (×1)	1 vCPU	1 GB	—
PostgreSQL	1 vCPU	2 GB	20 GB
S3	—	—	Unlimited

Dimension	Practical Limit	Notes
Daily ingestion	PB-scale	S3 write throughput bottleneck
Query concurrency	50–100	Add querier replicas
Retention	Unlimited	S3 lifecycle policies
Streams (indices)	10,000+	Metadata store may need PostgreSQL
Single query scan	TB-range	DataFusion parallelizes across partitions

140x cost claim is from vendor benchmarks and combines compression + storage tier + replication savings.
Full-text search performance lags behind Elasticsearch's inverted index for wildcard/fuzzy queries.
DataFusion is less battle-tested than ClickHouse or Elasticsearch at extreme scale.
Performance varies significantly with data patterns and query types.