Operations¶

Deployment, configuration, scaling, monitoring, and day-2 operations for Monoscope.

Deployment¶

Docker Compose (Quick Start)¶

git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# Visit http://localhost:8080 (default: admin/changeme)

Docker Compose (Production)¶

services:
  monoscope:
    image: monoscope/monoscope:v0.5.0
    ports:
      - "8080:8080"
      - "4317:4317"  # OTLP gRPC
    environment:
      - DATABASE_URL=postgresql://monoscope:password@postgres:5432/monoscope
      - S3_BUCKET=your-telemetry-bucket
      - AWS_REGION=us-east-1
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - KAFKA_BROKERS=kafka:9092
    depends_on:
      postgres:
        condition: service_healthy
      kafka:
        condition: service_started

  postgres:
    image: timescale/timescaledb:latest-pg18
    environment:
      POSTGRES_USER: monoscope
      POSTGRES_PASSWORD: password
      POSTGRES_DB: monoscope
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U monoscope"]
      interval: 5s
      timeout: 5s
      retries: 5

  kafka:
    image: confluentinc/cp-kafka:latest
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENERS: PLAINTEXT://:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_PROCESS_ROLES: broker,controller
      CLUSTER_ID: monoscope-cluster
    volumes:
      - kafkadata:/var/lib/kafka/data

volumes:
  pgdata:
  kafkadata:

Kubernetes with OTel Operator¶

# Install OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Create Instrumentation CRD pointing at Monoscope
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
  name: monoscope-instrumentation
  namespace: observability
spec:
  exporter:
    endpoint: http://monoscope:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_always_on
EOF

# Annotate deployments for auto-instrumentation
kubectl annotate deployment my-app \
  instrumentation.opentelemetry.io/inject-java="true" \
  --namespace default

Configuration¶

Key Environment Variables¶

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string for metadata
`S3_BUCKET`	Yes	S3-compatible bucket for telemetry data
`AWS_REGION`	Yes	S3 region
`AWS_ACCESS_KEY_ID`	Yes	S3 access key
`AWS_SECRET_ACCESS_KEY`	Yes	S3 secret key
`KAFKA_BROKERS`	Yes	Kafka bootstrap servers
`OTLP_PORT`	No	OTLP gRPC port (default: 4317)
`HTTP_PORT`	No	Web UI port (default: 8080)

TimeFusion Standalone Configuration¶

docker run -d \
  -p 5432:5432 \
  -e AWS_S3_BUCKET=your-bucket \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  timefusion/timefusion:latest
# Connect with any PostgreSQL client on port 5432

TimeFusion Cache Tuning¶

Parameter	Default	Description
Memory cache	512MB	In-memory Foyer adaptive cache
Disk cache	100GB	On-disk cache for warm data
TTL	7 days	Cache entry time-to-live
Hit rate target	95%+	Expected for hot data queries

Scaling¶

Vertical Scaling¶

Monoscope API: Increase CPU/memory for higher ingestion throughput
TimeFusion: Increase cache sizes (memory + disk) for faster queries
PostgreSQL: Scale TimescaleDB for metadata operations

Horizontal Scaling¶

Component	Strategy
Monoscope API	Run multiple pods behind load balancer (stateless)
TimeFusion	Multi-instance with DynamoDB distributed locking
Kafka	Partition-based horizontal scaling
PostgreSQL	Read replicas for query load

Storage Scaling¶

S3 provides effectively unlimited storage
Delta Lake compaction keeps query performance stable
Zstandard compression reduces storage 10-20x
No data sampling means costs scale linearly with ingest volume

Monitoring¶

Health Checks¶

# Check Monoscope API
curl -sf http://localhost:8080/healthz

# Check TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432 -c "SELECT 1"

# Check OTLP ingestion endpoint
grpcurl -plaintext localhost:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export

Key Metrics to Watch¶

Metric	Alert Threshold	Description
OTLP ingestion rate	Drop > 50%	Pipeline health
Kafka consumer lag	> 10000 messages	Extraction worker backlog
S3 write latency	> 500ms	Storage bottleneck
TimeFusion query latency (p99)	> 2s	Query performance
Cache hit rate	< 80%	Cache effectiveness
PostgreSQL connections	> 80% max	Connection pool saturation

Upgrades¶

# Pull latest image
docker pull monoscope/monoscope:v0.5.0

# Restart with new version (migrations baked into Docker image)
docker-compose down
docker-compose up -d

# Verify
curl -sf http://localhost:8080/healthz

Migrations are run automatically on startup. The Docker image includes all PLpgSQL migrations (87KB of SQL).

Backup & Recovery¶

S3 Data¶

Delta Lake provides ACID transactions and time travel
Enable S3 versioning for additional protection
Cross-region replication for disaster recovery

PostgreSQL Metadata¶

pg_dump -h postgres-host -U monoscope -d monoscope > monoscope-metadata-backup.sql

Recovery¶

Restore PostgreSQL from backup
S3 data is durable by default (no recovery needed)
Monoscope replays any missed events from Kafka on restart

Common Issues¶

Issue	Cause	Resolution
OTLP connection refused	Wrong port or missing Bearer token	Verify port 4317 and API key in OTel Collector config
Empty dashboards	TimeFusion not connected to S3	Check S3 credentials and bucket access
High query latency	Cache miss rate high	Increase memory/disk cache sizes
Kafka consumer lag	Extraction worker under-resourced	Scale worker instances or increase Kafka partitions
Session replay gaps	Browser SDK not capturing	Verify session replay SDK is initialized correctly

Sources¶

Commands & Recipes¶

Docker commands, SDK setup, OTel Collector configuration, and API snippets for Monoscope.

Docker Commands¶

Quick Start¶

git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# http://localhost:8080 (admin/changeme)

Run TimeFusion Standalone¶

docker run -d --name timefusion \
  -p 5432:5432 \
  -e AWS_S3_BUCKET=your-bucket \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  timefusion/timefusion:latest

# Query with any PostgreSQL client
psql -h localhost -p 5432 -c "SELECT * FROM otel_logs_and_spans LIMIT 10;"

OpenTelemetry Collector Configuration¶

Basic OTLP Export to Monoscope¶

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  otlphttp:
    endpoint: http://monoscope:4317
    headers:
      Authorization: "Bearer YOUR_API_KEY"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      exporters: [otlphttp]
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

Multi-Source Collector¶

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-app'
          scrape_interval: 15s
          static_configs:
            - targets: ['my-app:8080']

exporters:
  otlphttp:
    endpoint: https://app.monoscope.tech
    headers:
      Authorization: "Bearer YOUR_API_KEY"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      exporters: [otlphttp]
    metrics:
      receivers: [otlp, prometheus]
      exporters: [otlphttp]

SDK Setup¶

Python (Flask)¶

from apitoolkit_flask import observe_app
from flask import Flask

app = Flask(__name__)
observe_app(app, api_key="YOUR_API_KEY")

@app.route("/api/users")
def get_users():
    return {"users": []}

Python (Django)¶

# settings.py
APITOOLKIT = {
    "API_KEY": "YOUR_API_KEY",
}

# In middleware
MIDDLEWARE = [
    "apitoolkit_django.APIToolkitDjangoMiddleware",
    # ... other middleware
]

Node.js (Express)¶

const express = require("express");
const { observe } = require("apitoolkit-express");

const app = express();
observe(app, { apiKey: "YOUR_API_KEY" });

app.get("/api/users", (req, res) => {
  res.json({ users: [] });
});

app.listen(3000);

Go¶

package main

import (
    "net/http"
    monoscope "github.com/monoscope-tech/monoscope-go"
)

func main() {
    client, err := monoscope.NewClient(monoscope.Config{
        APIKey: "YOUR_API_KEY",
    })
    if err != nil {
        panic(err)
    }
    defer client.Close()

    mux := http.NewServeMux()
    mux.HandleFunc("/api/users", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte(`{"users":[]}`))
    })

    http.ListenAndServe(":8080", client.Middleware(mux))
}

Java (Spring Boot)¶

// Add OTel Java Agent at startup
java -javaagent:otel-agent.jar \
  -Dotel.exporter.otlp.endpoint=http://monoscope:4317 \
  -Dotel.exporter.otlp.headers="Authorization=Bearer YOUR_API_KEY" \
  -Dotel.service.name=my-service \
  -jar my-app.jar

PHP (Laravel)¶

// config/app.php — add service provider
'providers' => [
    APIToolkit\Laravel\APIToolkitServiceProvider::class,
];

// .env
APITOOLKIT_API_KEY=YOUR_API_KEY

Kubernetes Auto-Instrumentation¶

Install OTel Operator¶

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Create Instrumentation Resource¶

apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
  name: monoscope
  namespace: observability
spec:
  exporter:
    endpoint: http://monoscope.observability:4317
  propagators:
    - tracecontext
    - baggage
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

Annotate Deployments¶

# Java app
kubectl annotate deployment my-java-app \
  instrumentation.opentelemetry.io/inject-java="true" \
  instrumentation.opentelemetry.io/otel-exporter-otlp-endpoint="http://monoscope:4317"

# Python app
kubectl annotate deployment my-python-app \
  instrumentation.opentelemetry.io/inject-python="true"

# Node.js app
kubectl annotate deployment my-node-app \
  instrumentation.opentelemetry.io/inject-nodejs="true"

Alert Channel Configuration¶

Slack Webhook¶

# Configure in Monoscope UI or API
# Settings -> Alert Channels -> Add Slack
# Provide webhook URL: https://hooks.slack.com/services/T.../B.../xxx
# Use slash command: /monoscope-here in channel

Discord Webhook¶

# Settings -> Alert Channels -> Add Discord
# Provide webhook URL: https://discord.com/api/webhooks/.../...
# Daily admin summary enabled automatically

PagerDuty¶

# Settings -> Alert Channels -> Add PagerDuty
# Provide integration key from PagerDuty service

Querying TimeFusion¶

-- Connect to TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432

-- Recent errors
SELECT timestamp, name, duration, attributes___error___type
FROM otel_logs_and_spans
WHERE attributes___http___response___status_code LIKE '5%'
  AND timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC
LIMIT 50;

-- Top slowest endpoints (last 24 hours)
SELECT name, COUNT(*) as requests,
       AVG(duration) / 1000000 as avg_ms,
       MAX(duration) / 1000000 as max_ms
FROM otel_logs_and_spans
WHERE kind = 'SERVER'
  AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 20;