Skip to content

Operations

Deployment, configuration, scaling, monitoring, and day-2 operations for Monoscope.

Deployment

Docker Compose (Quick Start)

git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# Visit http://localhost:8080 (default: admin/changeme)

Docker Compose (Production)

services:
  monoscope:
    image: monoscope/monoscope:v0.5.0
    ports:
      - "8080:8080"
      - "4317:4317"  # OTLP gRPC
    environment:
      - DATABASE_URL=postgresql://monoscope:password@postgres:5432/monoscope
      - S3_BUCKET=your-telemetry-bucket
      - AWS_REGION=us-east-1
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - KAFKA_BROKERS=kafka:9092
    depends_on:
      postgres:
        condition: service_healthy
      kafka:
        condition: service_started

  postgres:
    image: timescale/timescaledb:latest-pg18
    environment:
      POSTGRES_USER: monoscope
      POSTGRES_PASSWORD: password
      POSTGRES_DB: monoscope
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U monoscope"]
      interval: 5s
      timeout: 5s
      retries: 5

  kafka:
    image: confluentinc/cp-kafka:latest
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENERS: PLAINTEXT://:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_PROCESS_ROLES: broker,controller
      CLUSTER_ID: monoscope-cluster
    volumes:
      - kafkadata:/var/lib/kafka/data

volumes:
  pgdata:
  kafkadata:

Kubernetes with OTel Operator

# Install OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Create Instrumentation CRD pointing at Monoscope
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
  name: monoscope-instrumentation
  namespace: observability
spec:
  exporter:
    endpoint: http://monoscope:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_always_on
EOF

# Annotate deployments for auto-instrumentation
kubectl annotate deployment my-app \
  instrumentation.opentelemetry.io/inject-java="true" \
  --namespace default

Configuration

Key Environment Variables

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string for metadata
S3_BUCKET Yes S3-compatible bucket for telemetry data
AWS_REGION Yes S3 region
AWS_ACCESS_KEY_ID Yes S3 access key
AWS_SECRET_ACCESS_KEY Yes S3 secret key
KAFKA_BROKERS Yes Kafka bootstrap servers
OTLP_PORT No OTLP gRPC port (default: 4317)
HTTP_PORT No Web UI port (default: 8080)

TimeFusion Standalone Configuration

docker run -d \
  -p 5432:5432 \
  -e AWS_S3_BUCKET=your-bucket \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  timefusion/timefusion:latest
# Connect with any PostgreSQL client on port 5432

TimeFusion Cache Tuning

Parameter Default Description
Memory cache 512MB In-memory Foyer adaptive cache
Disk cache 100GB On-disk cache for warm data
TTL 7 days Cache entry time-to-live
Hit rate target 95%+ Expected for hot data queries

Scaling

Vertical Scaling

  • Monoscope API: Increase CPU/memory for higher ingestion throughput
  • TimeFusion: Increase cache sizes (memory + disk) for faster queries
  • PostgreSQL: Scale TimescaleDB for metadata operations

Horizontal Scaling

Component Strategy
Monoscope API Run multiple pods behind load balancer (stateless)
TimeFusion Multi-instance with DynamoDB distributed locking
Kafka Partition-based horizontal scaling
PostgreSQL Read replicas for query load

Storage Scaling

  • S3 provides effectively unlimited storage
  • Delta Lake compaction keeps query performance stable
  • Zstandard compression reduces storage 10-20x
  • No data sampling means costs scale linearly with ingest volume

Monitoring

Health Checks

# Check Monoscope API
curl -sf http://localhost:8080/healthz

# Check TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432 -c "SELECT 1"

# Check OTLP ingestion endpoint
grpcurl -plaintext localhost:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export

Key Metrics to Watch

Metric Alert Threshold Description
OTLP ingestion rate Drop > 50% Pipeline health
Kafka consumer lag > 10000 messages Extraction worker backlog
S3 write latency > 500ms Storage bottleneck
TimeFusion query latency (p99) > 2s Query performance
Cache hit rate < 80% Cache effectiveness
PostgreSQL connections > 80% max Connection pool saturation

Upgrades

# Pull latest image
docker pull monoscope/monoscope:v0.5.0

# Restart with new version (migrations baked into Docker image)
docker-compose down
docker-compose up -d

# Verify
curl -sf http://localhost:8080/healthz

Migrations are run automatically on startup. The Docker image includes all PLpgSQL migrations (87KB of SQL).

Backup & Recovery

S3 Data

  • Delta Lake provides ACID transactions and time travel
  • Enable S3 versioning for additional protection
  • Cross-region replication for disaster recovery

PostgreSQL Metadata

pg_dump -h postgres-host -U monoscope -d monoscope > monoscope-metadata-backup.sql

Recovery

  1. Restore PostgreSQL from backup
  2. S3 data is durable by default (no recovery needed)
  3. Monoscope replays any missed events from Kafka on restart

Common Issues

Issue Cause Resolution
OTLP connection refused Wrong port or missing Bearer token Verify port 4317 and API key in OTel Collector config
Empty dashboards TimeFusion not connected to S3 Check S3 credentials and bucket access
High query latency Cache miss rate high Increase memory/disk cache sizes
Kafka consumer lag Extraction worker under-resourced Scale worker instances or increase Kafka partitions
Session replay gaps Browser SDK not capturing Verify session replay SDK is initialized correctly

Sources