How It Works¶

How Monoscope ingests telemetry via OTLP, stores it in S3 through TimeFusion, and provides LLM-powered querying with AI agent scheduling.

Ingestion Pipeline¶

Monoscope uses OpenTelemetry Protocol (OTLP) as its sole ingestion path:

flowchart LR
    subgraph Apps["Your Applications"]
        SDK1["Go SDK"]
        SDK2["Python SDK"]
        SDK3["Node.js SDK"]
        SDK4["Java Agent"]
    end

    subgraph Collector["OTel Collector"]
        OLTP["OTLP Receiver\n(gRPC :4317)"]
    end

    subgraph Monoscope["Monoscope Backend"]
        API["Ingestion API\n(Haskell)"]
        Kafka["Kafka\n(Buffer)"]
        Worker["Extraction Worker"]
    end

    subgraph Storage["Data Layer"]
        TF["TimeFusion\n(Rust + DataFusion)"]
        PG["PostgreSQL\n+ TimescaleDB"]
        S3["S3 Bucket\n(Delta Lake)"]
    end

    Apps -->|"OTLP"| Collector
    Collector -->|"OTLP/gRPC\nBearer API_KEY"| API
    API --> Kafka --> Worker
    Worker --> TF --> S3
    Worker --> PG

OTLP Ingestion¶

All telemetry arrives via OTLP over gRPC on port 4317 with Bearer token authentication:

Logs — structured and unstructured log events
Traces — spans with parent-child relationships, duration, attributes
Metrics — Sum, Histogram, ExponentialHistogram, Summary types

The ingestion API normalizes all data into a unified otel_logs_and_spans table schema before passing to TimeFusion.

TimeFusion Storage Engine¶

TimeFusion is Monoscope's purpose-built time-series database (separate open-source project at monoscope-tech/timefusion):

flowchart TB
    subgraph TF["TimeFusion Engine (Rust)"]
        PGWire["PostgreSQL Wire Protocol\n(pgwire)"]
        DF["Apache DataFusion\n(Query Engine)"]
        Cache["Two-Tier Cache\n(Foyer)"]
        Mem["Memory Cache\n512MB default"]
        Disk["Disk Cache\n100GB default"]
        DL["Delta Lake\n(ACID Transactions)"]
    end

    subgraph S3["S3-Compatible Storage"]
        PQ["Parquet Files\n(Zstd compressed)"]
    end

    PGWire --> DF
    DF --> Cache
    Cache --> Mem
    Cache --> Disk
    DF --> DL --> PQ

    style TF fill:#1565c0,color:#fff
    style S3 fill:#2e7d32,color:#fff

Key Properties¶

Property	Detail
Wire protocol	PostgreSQL-compatible via pgwire — any Postgres client can query
Query engine	Apache DataFusion with vectorized execution
Storage format	Delta Lake with Parquet files on S3
Compression	Zstandard (10-20x reduction)
Throughput	500K+ events/sec per instance
ACID	Delta Lake transactions for consistency
Caching	Foyer adaptive: 512MB memory + 100GB disk, 7-day TTL, 95%+ hit rate
Distributed	DynamoDB-based locking for multi-instance deployments

Main Table Schema¶

The otel_logs_and_spans table stores all telemetry in a unified schema:

Column	Type	Purpose
`name`	text	Span/log name (e.g., HTTP endpoint path)
`id`	uuid	Unique identifier
`project_id`	uuid	Tenant/project isolation
`timestamp`	timestamptz	Event timestamp
`date`	date	Partition key
`hashes`	text[]	Trace lookup hashes
`duration`	bigint	Span duration in nanoseconds
`attributes___http___response___status_code`	text	Flattened OTel attributes (triple underscore separator)
`attributes___user___id`	text	User identity propagation
`attributes___error___type`	text	Error classification
`kind`	text	Span kind (SERVER, CLIENT, INTERNAL, etc.)

Natural Language Query Engine¶

Monoscope integrates LLMs to translate plain-English queries into SQL executed against TimeFusion:

User input — "Show me all 500 errors from the payments service yesterday"
LLM translation — converts to a parameterized SQL query targeting otel_logs_and_spans
Query execution — TimeFusion executes with vectorized DataFusion engine
Result visualization — charts, log tables, and trace waterfalls rendered in the UI

AI Agent Scheduler¶

Scheduled agents run LLM-powered analysis on telemetry data:

flowchart LR
    Scheduler["Agent Scheduler\n(Haskell)"]
    LLM["LLM API"]
    Data["TimeFusion\nQuery"]
    Detect["Anomaly Detection"]
    Report["Email Report"]
    Alert["Alert Channels"]

    Scheduler -->|"Query + Analyze"| Data
    Data --> LLM
    LLM --> Detect
    Detect -->|"Anomaly found"| Report
    Detect -->|"Critical"| Alert

Configurable intervals: hourly, daily, weekly
Anomaly detection: volume spikes, error rate changes, latency degradation
Email reports: summary of findings delivered to configured recipients
Alerting: critical findings routed to Slack, Discord, PagerDuty, or webhooks

Error Fingerprinting¶

Monoscope uses a two-tier fingerprinting system:

Jaccard similarity — groups errors with similar stack traces using set-based comparison
Embedding-based merging — semantically similar errors are merged even with different text
Framework-error rollup — known framework errors (e.g., Django Http404, Express ECONNREFUSED) are automatically categorized

Session Replay¶

Browser session recordings synced with backend telemetry:

Browser SDK captures DOM mutations, user interactions, and network requests
Events are batched and sent to Monoscope's ingestion API
Session merging worker combines replay events with backend spans using correlation IDs
Merged sessions are stored in S3 and viewable in the UI alongside traces and logs