Architecture¶
Component topology, data model, deployment patterns, and technology choices for Monoscope.
Component Topology¶
flowchart TB
subgraph Clients["Client Applications"]
Apps["Your Apps\n(OTel SDKs)"]
Browser["Browser\n(Session Replay SDK)"]
end
subgraph Ingestion["Ingestion Layer"]
OTel["OTel Collector\n(gRPC :4317)"]
API["Monoscope API\n(Haskell)"]
Kafka["Kafka Buffer"]
end
subgraph Processing["Processing Layer"]
Worker["Extraction Worker"]
Agent["AI Agent\nScheduler"]
LLM["LLM API"]
end
subgraph Storage["Storage Layer"]
TF["TimeFusion\n(Rust + DataFusion)"]
PG["PostgreSQL\n+ TimescaleDB (pg18)"]
S3["S3 Bucket\n(Delta Lake / Parquet)"]
Cache["Foyer Cache\n(512MB mem + 100GB disk)"]
end
subgraph UI["Presentation Layer"]
Web["Web Dashboard\n(HTMX + Tailwind)"]
Alerts["Alert Channels\n(Slack, Discord, PagerDuty)"]
end
Apps -->|"OTLP/gRPC"| OTel
Browser -->|"Session events"| API
OTel -->|"Bearer token"| API
API --> Kafka --> Worker
Worker --> TF
Worker --> PG
TF --> S3
TF --> Cache
Agent -->|"Query"| TF
Agent --> LLM
Agent --> Alerts
Web -->|"SQL via pgwire"| TF
Web --> PG
style Storage fill:#2e7d32,color:#fff
style Ingestion fill:#1565c0,color:#fff
style Processing fill:#e65100,color:#fff
Technology Breakdown¶
| Component | Language | Framework/Library | Purpose |
|---|---|---|---|
| Monoscope Backend | Haskell (80.5%) | Hasql, Lucid, HTMX, Eff | API, ingestion, processing, web UI |
| TimeFusion | Rust | DataFusion, pgwire, Delta Lake, Foyer | Time-series query engine with S3 storage |
| Metadata DB | PLpgSQL (2.4%) | PostgreSQL + TimescaleDB (pg18) | Project config, alerts, user management |
| Frontend | TypeScript (11.7%) | HTMX, Tailwind v4, DaisyUI v5, ECharts | Server-rendered UI with dynamic updates |
| SDKs | Multi-language | OTel SDK wrappers | Application instrumentation |
| Migrations | PLpgSQL | 87KB of SQL migrations | Schema evolution |
Haskell Backend Internals¶
hasql-interpolatefor type-safe PostgreSQL queries (migrated frompostgresql-simplein v0.5.0)Lucidfor HTML templating (server-rendered)HTMXfor dynamic page updates with morphingEffeffect system for IO abstractionEffectful.Timefor time operationsFourmolufor code formatting- GHC 9.12 compatible
Data Model¶
Telemetry Storage (TimeFusion / S3)¶
erDiagram
PROJECT ||--o{ OTEL_EVENTS : "contains"
OTEL_EVENTS {
uuid id PK
uuid project_id FK
timestamptz timestamp
date date_partition
text name
bigint duration_ns
text kind
text[] hashes
text attributes
}
TRACE {
uuid trace_id
uuid span_id
uuid parent_span_id
text service_name
text operation_name
}
LOG_ENTRY {
uuid id PK
timestamptz timestamp
text severity
text body
text attributes
}
METRIC {
text metric_name
text metric_type
float value
text labels
}
Metadata Storage (PostgreSQL + TimescaleDB)¶
- Projects — tenant isolation, API keys, retention settings
- Monitors — alerting rules, health checks, renotify intervals
- Alerting state — active incidents, notification history
- Users/Teams — authentication, authorization, audit logs
- AI Agent configs — schedules, LLM prompts, report recipients
Deployment Topologies¶
Docker Compose (Development / Small Production)¶
flowchart LR
subgraph Host["Docker Host"]
M["monoscope\n:8080"]
TF["timefusion\n:5432"]
PG["postgres+timescaledb\n:5433"]
K["kafka\n:9092"]
S3["localstack/minio\nS3-compatible"]
end
M --> TF --> S3
M --> PG
M --> K
Self-Hosted Production¶
flowchart TB
subgraph LB["Load Balancer"]
Nginx["NGINX\n(TLS Termination)"]
end
subgraph K8s["Kubernetes Cluster"]
subgraph Monoscope["Monoscope Pods"]
M1["monoscope-api-1"]
M2["monoscope-api-2"]
end
subgraph Workers["Background Workers"]
W1["extraction-worker"]
W2["ai-agent-scheduler"]
end
OTelCol["OTel Collector"]
end
subgraph Data["External Data"]
S3Prod["AWS S3 / MinIO"]
PGHA["PostgreSQL HA\n(Patroni / RDS)"]
TFProd["TimeFusion\n(Deployed separately)"]
KProd["Kafka Cluster"]
end
Nginx --> M1
Nginx --> M2
M1 --> S3Prod
M1 --> PGHA
M1 --> TFProd
Workers --> S3Prod
OTelCol --> M1
style K8s fill:#1565c0,color:#fff
style Data fill:#2e7d32,color:#fff
Monoscope Cloud (SaaS)¶
flowchart LR
subgraph Cloud["Monoscope Cloud"]
MC["Managed Monoscope\n+ TimeFusion"]
MCS3["Monoscope S3"]
end
subgraph BYOS["Your S3 Bucket\n(optional)"]
US3["Your S3\n(unlimited retention)"]
end
Apps["Your Apps"] -->|"OTLP"| Cloud
MC --> MCS3
MC -->|"BYOS mode"| US3
Sources¶
How It Works¶
How Monoscope ingests telemetry via OTLP, stores it in S3 through TimeFusion, and provides LLM-powered querying with AI agent scheduling.
Ingestion Pipeline¶
Monoscope uses OpenTelemetry Protocol (OTLP) as its sole ingestion path:
flowchart LR
subgraph Apps["Your Applications"]
SDK1["Go SDK"]
SDK2["Python SDK"]
SDK3["Node.js SDK"]
SDK4["Java Agent"]
end
subgraph Collector["OTel Collector"]
OLTP["OTLP Receiver\n(gRPC :4317)"]
end
subgraph Monoscope["Monoscope Backend"]
API["Ingestion API\n(Haskell)"]
Kafka["Kafka\n(Buffer)"]
Worker["Extraction Worker"]
end
subgraph Storage["Data Layer"]
TF["TimeFusion\n(Rust + DataFusion)"]
PG["PostgreSQL\n+ TimescaleDB"]
S3["S3 Bucket\n(Delta Lake)"]
end
Apps -->|"OTLP"| Collector
Collector -->|"OTLP/gRPC\nBearer API_KEY"| API
API --> Kafka --> Worker
Worker --> TF --> S3
Worker --> PG
OTLP Ingestion¶
All telemetry arrives via OTLP over gRPC on port 4317 with Bearer token authentication:
- Logs — structured and unstructured log events
- Traces — spans with parent-child relationships, duration, attributes
- Metrics — Sum, Histogram, ExponentialHistogram, Summary types
The ingestion API normalizes all data into a unified otel_logs_and_spans table schema before passing to TimeFusion.
TimeFusion Storage Engine¶
TimeFusion is Monoscope's purpose-built time-series database (separate open-source project at monoscope-tech/timefusion):
flowchart TB
subgraph TF["TimeFusion Engine (Rust)"]
PGWire["PostgreSQL Wire Protocol\n(pgwire)"]
DF["Apache DataFusion\n(Query Engine)"]
Cache["Two-Tier Cache\n(Foyer)"]
Mem["Memory Cache\n512MB default"]
Disk["Disk Cache\n100GB default"]
DL["Delta Lake\n(ACID Transactions)"]
end
subgraph S3["S3-Compatible Storage"]
PQ["Parquet Files\n(Zstd compressed)"]
end
PGWire --> DF
DF --> Cache
Cache --> Mem
Cache --> Disk
DF --> DL --> PQ
style TF fill:#1565c0,color:#fff
style S3 fill:#2e7d32,color:#fff
Key Properties¶
| Property | Detail |
|---|---|
| Wire protocol | PostgreSQL-compatible via pgwire — any Postgres client can query |
| Query engine | Apache DataFusion with vectorized execution |
| Storage format | Delta Lake with Parquet files on S3 |
| Compression | Zstandard (10-20x reduction) |
| Throughput | 500K+ events/sec per instance |
| ACID | Delta Lake transactions for consistency |
| Caching | Foyer adaptive: 512MB memory + 100GB disk, 7-day TTL, 95%+ hit rate |
| Distributed | DynamoDB-based locking for multi-instance deployments |
Main Table Schema¶
The otel_logs_and_spans table stores all telemetry in a unified schema:
| Column | Type | Purpose |
|---|---|---|
name |
text | Span/log name (e.g., HTTP endpoint path) |
id |
uuid | Unique identifier |
project_id |
uuid | Tenant/project isolation |
timestamp |
timestamptz | Event timestamp |
date |
date | Partition key |
hashes |
text[] | Trace lookup hashes |
duration |
bigint | Span duration in nanoseconds |
attributes___http___response___status_code |
text | Flattened OTel attributes (triple underscore separator) |
attributes___user___id |
text | User identity propagation |
attributes___error___type |
text | Error classification |
kind |
text | Span kind (SERVER, CLIENT, INTERNAL, etc.) |
Natural Language Query Engine¶
Monoscope integrates LLMs to translate plain-English queries into SQL executed against TimeFusion:
- User input — "Show me all 500 errors from the payments service yesterday"
- LLM translation — converts to a parameterized SQL query targeting
otel_logs_and_spans - Query execution — TimeFusion executes with vectorized DataFusion engine
- Result visualization — charts, log tables, and trace waterfalls rendered in the UI
AI Agent Scheduler¶
Scheduled agents run LLM-powered analysis on telemetry data:
flowchart LR
Scheduler["Agent Scheduler\n(Haskell)"]
LLM["LLM API"]
Data["TimeFusion\nQuery"]
Detect["Anomaly Detection"]
Report["Email Report"]
Alert["Alert Channels"]
Scheduler -->|"Query + Analyze"| Data
Data --> LLM
LLM --> Detect
Detect -->|"Anomaly found"| Report
Detect -->|"Critical"| Alert
- Configurable intervals: hourly, daily, weekly
- Anomaly detection: volume spikes, error rate changes, latency degradation
- Email reports: summary of findings delivered to configured recipients
- Alerting: critical findings routed to Slack, Discord, PagerDuty, or webhooks
Error Fingerprinting¶
Monoscope uses a two-tier fingerprinting system:
- Jaccard similarity — groups errors with similar stack traces using set-based comparison
- Embedding-based merging — semantically similar errors are merged even with different text
- Framework-error rollup — known framework errors (e.g., Django
Http404, ExpressECONNREFUSED) are automatically categorized
Session Replay¶
Browser session recordings synced with backend telemetry:
- Browser SDK captures DOM mutations, user interactions, and network requests
- Events are batched and sent to Monoscope's ingestion API
- Session merging worker combines replay events with backend spans using correlation IDs
- Merged sessions are stored in S3 and viewable in the UI alongside traces and logs