How It Works¶
How Monoscope ingests telemetry via OTLP, stores it in S3 through TimeFusion, and provides LLM-powered querying with AI agent scheduling.
Ingestion Pipeline¶
Monoscope uses OpenTelemetry Protocol (OTLP) as its sole ingestion path:
flowchart LR
subgraph Apps["Your Applications"]
SDK1["Go SDK"]
SDK2["Python SDK"]
SDK3["Node.js SDK"]
SDK4["Java Agent"]
end
subgraph Collector["OTel Collector"]
OLTP["OTLP Receiver\n(gRPC :4317)"]
end
subgraph Monoscope["Monoscope Backend"]
API["Ingestion API\n(Haskell)"]
Kafka["Kafka\n(Buffer)"]
Worker["Extraction Worker"]
end
subgraph Storage["Data Layer"]
TF["TimeFusion\n(Rust + DataFusion)"]
PG["PostgreSQL\n+ TimescaleDB"]
S3["S3 Bucket\n(Delta Lake)"]
end
Apps -->|"OTLP"| Collector
Collector -->|"OTLP/gRPC\nBearer API_KEY"| API
API --> Kafka --> Worker
Worker --> TF --> S3
Worker --> PG
OTLP Ingestion¶
All telemetry arrives via OTLP over gRPC on port 4317 with Bearer token authentication:
- Logs — structured and unstructured log events
- Traces — spans with parent-child relationships, duration, attributes
- Metrics — Sum, Histogram, ExponentialHistogram, Summary types
The ingestion API normalizes all data into a unified otel_logs_and_spans table schema before passing to TimeFusion.
TimeFusion Storage Engine¶
TimeFusion is Monoscope's purpose-built time-series database (separate open-source project at monoscope-tech/timefusion):
flowchart TB
subgraph TF["TimeFusion Engine (Rust)"]
PGWire["PostgreSQL Wire Protocol\n(pgwire)"]
DF["Apache DataFusion\n(Query Engine)"]
Cache["Two-Tier Cache\n(Foyer)"]
Mem["Memory Cache\n512MB default"]
Disk["Disk Cache\n100GB default"]
DL["Delta Lake\n(ACID Transactions)"]
end
subgraph S3["S3-Compatible Storage"]
PQ["Parquet Files\n(Zstd compressed)"]
end
PGWire --> DF
DF --> Cache
Cache --> Mem
Cache --> Disk
DF --> DL --> PQ
style TF fill:#1565c0,color:#fff
style S3 fill:#2e7d32,color:#fff
Key Properties¶
| Property | Detail |
|---|---|
| Wire protocol | PostgreSQL-compatible via pgwire — any Postgres client can query |
| Query engine | Apache DataFusion with vectorized execution |
| Storage format | Delta Lake with Parquet files on S3 |
| Compression | Zstandard (10-20x reduction) |
| Throughput | 500K+ events/sec per instance |
| ACID | Delta Lake transactions for consistency |
| Caching | Foyer adaptive: 512MB memory + 100GB disk, 7-day TTL, 95%+ hit rate |
| Distributed | DynamoDB-based locking for multi-instance deployments |
Main Table Schema¶
The otel_logs_and_spans table stores all telemetry in a unified schema:
| Column | Type | Purpose |
|---|---|---|
name |
text | Span/log name (e.g., HTTP endpoint path) |
id |
uuid | Unique identifier |
project_id |
uuid | Tenant/project isolation |
timestamp |
timestamptz | Event timestamp |
date |
date | Partition key |
hashes |
text[] | Trace lookup hashes |
duration |
bigint | Span duration in nanoseconds |
attributes___http___response___status_code |
text | Flattened OTel attributes (triple underscore separator) |
attributes___user___id |
text | User identity propagation |
attributes___error___type |
text | Error classification |
kind |
text | Span kind (SERVER, CLIENT, INTERNAL, etc.) |
Natural Language Query Engine¶
Monoscope integrates LLMs to translate plain-English queries into SQL executed against TimeFusion:
- User input — "Show me all 500 errors from the payments service yesterday"
- LLM translation — converts to a parameterized SQL query targeting
otel_logs_and_spans - Query execution — TimeFusion executes with vectorized DataFusion engine
- Result visualization — charts, log tables, and trace waterfalls rendered in the UI
AI Agent Scheduler¶
Scheduled agents run LLM-powered analysis on telemetry data:
flowchart LR
Scheduler["Agent Scheduler\n(Haskell)"]
LLM["LLM API"]
Data["TimeFusion\nQuery"]
Detect["Anomaly Detection"]
Report["Email Report"]
Alert["Alert Channels"]
Scheduler -->|"Query + Analyze"| Data
Data --> LLM
LLM --> Detect
Detect -->|"Anomaly found"| Report
Detect -->|"Critical"| Alert
- Configurable intervals: hourly, daily, weekly
- Anomaly detection: volume spikes, error rate changes, latency degradation
- Email reports: summary of findings delivered to configured recipients
- Alerting: critical findings routed to Slack, Discord, PagerDuty, or webhooks
Error Fingerprinting¶
Monoscope uses a two-tier fingerprinting system:
- Jaccard similarity — groups errors with similar stack traces using set-based comparison
- Embedding-based merging — semantically similar errors are merged even with different text
- Framework-error rollup — known framework errors (e.g., Django
Http404, ExpressECONNREFUSED) are automatically categorized
Session Replay¶
Browser session recordings synced with backend telemetry:
- Browser SDK captures DOM mutations, user interactions, and network requests
- Events are batched and sent to Monoscope's ingestion API
- Session merging worker combines replay events with backend spans using correlation IDs
- Merged sessions are stored in S3 and viewable in the UI alongside traces and logs