Skip to content

How It Works

How Monoscope ingests telemetry via OTLP, stores it in S3 through TimeFusion, and provides LLM-powered querying with AI agent scheduling.

Ingestion Pipeline

Monoscope uses OpenTelemetry Protocol (OTLP) as its sole ingestion path:

flowchart LR
    subgraph Apps["Your Applications"]
        SDK1["Go SDK"]
        SDK2["Python SDK"]
        SDK3["Node.js SDK"]
        SDK4["Java Agent"]
    end

    subgraph Collector["OTel Collector"]
        OLTP["OTLP Receiver\n(gRPC :4317)"]
    end

    subgraph Monoscope["Monoscope Backend"]
        API["Ingestion API\n(Haskell)"]
        Kafka["Kafka\n(Buffer)"]
        Worker["Extraction Worker"]
    end

    subgraph Storage["Data Layer"]
        TF["TimeFusion\n(Rust + DataFusion)"]
        PG["PostgreSQL\n+ TimescaleDB"]
        S3["S3 Bucket\n(Delta Lake)"]
    end

    Apps -->|"OTLP"| Collector
    Collector -->|"OTLP/gRPC\nBearer API_KEY"| API
    API --> Kafka --> Worker
    Worker --> TF --> S3
    Worker --> PG

OTLP Ingestion

All telemetry arrives via OTLP over gRPC on port 4317 with Bearer token authentication:

  • Logs — structured and unstructured log events
  • Traces — spans with parent-child relationships, duration, attributes
  • Metrics — Sum, Histogram, ExponentialHistogram, Summary types

The ingestion API normalizes all data into a unified otel_logs_and_spans table schema before passing to TimeFusion.

TimeFusion Storage Engine

TimeFusion is Monoscope's purpose-built time-series database (separate open-source project at monoscope-tech/timefusion):

flowchart TB
    subgraph TF["TimeFusion Engine (Rust)"]
        PGWire["PostgreSQL Wire Protocol\n(pgwire)"]
        DF["Apache DataFusion\n(Query Engine)"]
        Cache["Two-Tier Cache\n(Foyer)"]
        Mem["Memory Cache\n512MB default"]
        Disk["Disk Cache\n100GB default"]
        DL["Delta Lake\n(ACID Transactions)"]
    end

    subgraph S3["S3-Compatible Storage"]
        PQ["Parquet Files\n(Zstd compressed)"]
    end

    PGWire --> DF
    DF --> Cache
    Cache --> Mem
    Cache --> Disk
    DF --> DL --> PQ

    style TF fill:#1565c0,color:#fff
    style S3 fill:#2e7d32,color:#fff

Key Properties

Property Detail
Wire protocol PostgreSQL-compatible via pgwire — any Postgres client can query
Query engine Apache DataFusion with vectorized execution
Storage format Delta Lake with Parquet files on S3
Compression Zstandard (10-20x reduction)
Throughput 500K+ events/sec per instance
ACID Delta Lake transactions for consistency
Caching Foyer adaptive: 512MB memory + 100GB disk, 7-day TTL, 95%+ hit rate
Distributed DynamoDB-based locking for multi-instance deployments

Main Table Schema

The otel_logs_and_spans table stores all telemetry in a unified schema:

Column Type Purpose
name text Span/log name (e.g., HTTP endpoint path)
id uuid Unique identifier
project_id uuid Tenant/project isolation
timestamp timestamptz Event timestamp
date date Partition key
hashes text[] Trace lookup hashes
duration bigint Span duration in nanoseconds
attributes___http___response___status_code text Flattened OTel attributes (triple underscore separator)
attributes___user___id text User identity propagation
attributes___error___type text Error classification
kind text Span kind (SERVER, CLIENT, INTERNAL, etc.)

Natural Language Query Engine

Monoscope integrates LLMs to translate plain-English queries into SQL executed against TimeFusion:

  1. User input — "Show me all 500 errors from the payments service yesterday"
  2. LLM translation — converts to a parameterized SQL query targeting otel_logs_and_spans
  3. Query execution — TimeFusion executes with vectorized DataFusion engine
  4. Result visualization — charts, log tables, and trace waterfalls rendered in the UI

AI Agent Scheduler

Scheduled agents run LLM-powered analysis on telemetry data:

flowchart LR
    Scheduler["Agent Scheduler\n(Haskell)"]
    LLM["LLM API"]
    Data["TimeFusion\nQuery"]
    Detect["Anomaly Detection"]
    Report["Email Report"]
    Alert["Alert Channels"]

    Scheduler -->|"Query + Analyze"| Data
    Data --> LLM
    LLM --> Detect
    Detect -->|"Anomaly found"| Report
    Detect -->|"Critical"| Alert
  • Configurable intervals: hourly, daily, weekly
  • Anomaly detection: volume spikes, error rate changes, latency degradation
  • Email reports: summary of findings delivered to configured recipients
  • Alerting: critical findings routed to Slack, Discord, PagerDuty, or webhooks

Error Fingerprinting

Monoscope uses a two-tier fingerprinting system:

  1. Jaccard similarity — groups errors with similar stack traces using set-based comparison
  2. Embedding-based merging — semantically similar errors are merged even with different text
  3. Framework-error rollup — known framework errors (e.g., Django Http404, Express ECONNREFUSED) are automatically categorized

Session Replay

Browser session recordings synced with backend telemetry:

  1. Browser SDK captures DOM mutations, user interactions, and network requests
  2. Events are batched and sent to Monoscope's ingestion API
  3. Session merging worker combines replay events with backend spans using correlation IDs
  4. Merged sessions are stored in S3 and viewable in the UI alongside traces and logs

Sources