Operations
Deployment, configuration, scaling, monitoring, and day-2 operations for Monoscope.
Deployment
Docker Compose (Quick Start)
git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# Visit http://localhost:8080 (default: admin/changeme)
Docker Compose (Production)
services:
monoscope:
image: monoscope/monoscope:v0.5.0
ports:
- "8080:8080"
- "4317:4317" # OTLP gRPC
environment:
- DATABASE_URL=postgresql://monoscope:password@postgres:5432/monoscope
- S3_BUCKET=your-telemetry-bucket
- AWS_REGION=us-east-1
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- KAFKA_BROKERS=kafka:9092
depends_on:
postgres:
condition: service_healthy
kafka:
condition: service_started
postgres:
image: timescale/timescaledb:latest-pg18
environment:
POSTGRES_USER: monoscope
POSTGRES_PASSWORD: password
POSTGRES_DB: monoscope
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U monoscope"]
interval: 5s
timeout: 5s
retries: 5
kafka:
image: confluentinc/cp-kafka:latest
environment:
KAFKA_NODE_ID: 1
KAFKA_LISTENERS: PLAINTEXT://:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_PROCESS_ROLES: broker,controller
CLUSTER_ID: monoscope-cluster
volumes:
- kafkadata:/var/lib/kafka/data
volumes:
pgdata:
kafkadata:
Kubernetes with OTel Operator
# Install OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
# Create Instrumentation CRD pointing at Monoscope
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
name: monoscope-instrumentation
namespace: observability
spec:
exporter:
endpoint: http://monoscope:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_always_on
EOF
# Annotate deployments for auto-instrumentation
kubectl annotate deployment my-app \
instrumentation.opentelemetry.io/inject-java="true" \
--namespace default
Configuration
Key Environment Variables
| Variable |
Required |
Description |
DATABASE_URL |
Yes |
PostgreSQL connection string for metadata |
S3_BUCKET |
Yes |
S3-compatible bucket for telemetry data |
AWS_REGION |
Yes |
S3 region |
AWS_ACCESS_KEY_ID |
Yes |
S3 access key |
AWS_SECRET_ACCESS_KEY |
Yes |
S3 secret key |
KAFKA_BROKERS |
Yes |
Kafka bootstrap servers |
OTLP_PORT |
No |
OTLP gRPC port (default: 4317) |
HTTP_PORT |
No |
Web UI port (default: 8080) |
TimeFusion Standalone Configuration
docker run -d \
-p 5432:5432 \
-e AWS_S3_BUCKET=your-bucket \
-e AWS_ACCESS_KEY_ID=your-key \
-e AWS_SECRET_ACCESS_KEY=your-secret \
timefusion/timefusion:latest
# Connect with any PostgreSQL client on port 5432
TimeFusion Cache Tuning
| Parameter |
Default |
Description |
| Memory cache |
512MB |
In-memory Foyer adaptive cache |
| Disk cache |
100GB |
On-disk cache for warm data |
| TTL |
7 days |
Cache entry time-to-live |
| Hit rate target |
95%+ |
Expected for hot data queries |
Scaling
Vertical Scaling
- Monoscope API: Increase CPU/memory for higher ingestion throughput
- TimeFusion: Increase cache sizes (memory + disk) for faster queries
- PostgreSQL: Scale TimescaleDB for metadata operations
Horizontal Scaling
| Component |
Strategy |
| Monoscope API |
Run multiple pods behind load balancer (stateless) |
| TimeFusion |
Multi-instance with DynamoDB distributed locking |
| Kafka |
Partition-based horizontal scaling |
| PostgreSQL |
Read replicas for query load |
Storage Scaling
- S3 provides effectively unlimited storage
- Delta Lake compaction keeps query performance stable
- Zstandard compression reduces storage 10-20x
- No data sampling means costs scale linearly with ingest volume
Monitoring
Health Checks
# Check Monoscope API
curl -sf http://localhost:8080/healthz
# Check TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432 -c "SELECT 1"
# Check OTLP ingestion endpoint
grpcurl -plaintext localhost:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export
Key Metrics to Watch
| Metric |
Alert Threshold |
Description |
| OTLP ingestion rate |
Drop > 50% |
Pipeline health |
| Kafka consumer lag |
> 10000 messages |
Extraction worker backlog |
| S3 write latency |
> 500ms |
Storage bottleneck |
| TimeFusion query latency (p99) |
> 2s |
Query performance |
| Cache hit rate |
< 80% |
Cache effectiveness |
| PostgreSQL connections |
> 80% max |
Connection pool saturation |
Upgrades
# Pull latest image
docker pull monoscope/monoscope:v0.5.0
# Restart with new version (migrations baked into Docker image)
docker-compose down
docker-compose up -d
# Verify
curl -sf http://localhost:8080/healthz
Migrations are run automatically on startup. The Docker image includes all PLpgSQL migrations (87KB of SQL).
Backup & Recovery
S3 Data
- Delta Lake provides ACID transactions and time travel
- Enable S3 versioning for additional protection
- Cross-region replication for disaster recovery
PostgreSQL Metadata
pg_dump -h postgres-host -U monoscope -d monoscope > monoscope-metadata-backup.sql
Recovery
- Restore PostgreSQL from backup
- S3 data is durable by default (no recovery needed)
- Monoscope replays any missed events from Kafka on restart
Common Issues
| Issue |
Cause |
Resolution |
| OTLP connection refused |
Wrong port or missing Bearer token |
Verify port 4317 and API key in OTel Collector config |
| Empty dashboards |
TimeFusion not connected to S3 |
Check S3 credentials and bucket access |
| High query latency |
Cache miss rate high |
Increase memory/disk cache sizes |
| Kafka consumer lag |
Extraction worker under-resourced |
Scale worker instances or increase Kafka partitions |
| Session replay gaps |
Browser SDK not capturing |
Verify session replay SDK is initialized correctly |
Sources
Commands & Recipes
Docker commands, SDK setup, OTel Collector configuration, and API snippets for Monoscope.
Docker Commands
Quick Start
git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# http://localhost:8080 (admin/changeme)
Run TimeFusion Standalone
docker run -d --name timefusion \
-p 5432:5432 \
-e AWS_S3_BUCKET=your-bucket \
-e AWS_ACCESS_KEY_ID=your-key \
-e AWS_SECRET_ACCESS_KEY=your-secret \
timefusion/timefusion:latest
# Query with any PostgreSQL client
psql -h localhost -p 5432 -c "SELECT * FROM otel_logs_and_spans LIMIT 10;"
OpenTelemetry Collector Configuration
Basic OTLP Export to Monoscope
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
otlphttp:
endpoint: http://monoscope:4317
headers:
Authorization: "Bearer YOUR_API_KEY"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlphttp]
logs:
receivers: [otlp]
exporters: [otlphttp]
metrics:
receivers: [otlp]
exporters: [otlphttp]
Multi-Source Collector
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
prometheus:
config:
scrape_configs:
- job_name: 'my-app'
scrape_interval: 15s
static_configs:
- targets: ['my-app:8080']
exporters:
otlphttp:
endpoint: https://app.monoscope.tech
headers:
Authorization: "Bearer YOUR_API_KEY"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlphttp]
logs:
receivers: [otlp]
exporters: [otlphttp]
metrics:
receivers: [otlp, prometheus]
exporters: [otlphttp]
SDK Setup
Python (Flask)
from apitoolkit_flask import observe_app
from flask import Flask
app = Flask(__name__)
observe_app(app, api_key="YOUR_API_KEY")
@app.route("/api/users")
def get_users():
return {"users": []}
Python (Django)
# settings.py
APITOOLKIT = {
"API_KEY": "YOUR_API_KEY",
}
# In middleware
MIDDLEWARE = [
"apitoolkit_django.APIToolkitDjangoMiddleware",
# ... other middleware
]
Node.js (Express)
const express = require("express");
const { observe } = require("apitoolkit-express");
const app = express();
observe(app, { apiKey: "YOUR_API_KEY" });
app.get("/api/users", (req, res) => {
res.json({ users: [] });
});
app.listen(3000);
Go
package main
import (
"net/http"
monoscope "github.com/monoscope-tech/monoscope-go"
)
func main() {
client, err := monoscope.NewClient(monoscope.Config{
APIKey: "YOUR_API_KEY",
})
if err != nil {
panic(err)
}
defer client.Close()
mux := http.NewServeMux()
mux.HandleFunc("/api/users", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"users":[]}`))
})
http.ListenAndServe(":8080", client.Middleware(mux))
}
Java (Spring Boot)
// Add OTel Java Agent at startup
java -javaagent:otel-agent.jar \
-Dotel.exporter.otlp.endpoint=http://monoscope:4317 \
-Dotel.exporter.otlp.headers="Authorization=Bearer YOUR_API_KEY" \
-Dotel.service.name=my-service \
-jar my-app.jar
PHP (Laravel)
// config/app.php — add service provider
'providers' => [
APIToolkit\Laravel\APIToolkitServiceProvider::class,
];
// .env
APITOOLKIT_API_KEY=YOUR_API_KEY
Kubernetes Auto-Instrumentation
Install OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Create Instrumentation Resource
apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
name: monoscope
namespace: observability
spec:
exporter:
endpoint: http://monoscope.observability:4317
propagators:
- tracecontext
- baggage
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
Annotate Deployments
# Java app
kubectl annotate deployment my-java-app \
instrumentation.opentelemetry.io/inject-java="true" \
instrumentation.opentelemetry.io/otel-exporter-otlp-endpoint="http://monoscope:4317"
# Python app
kubectl annotate deployment my-python-app \
instrumentation.opentelemetry.io/inject-python="true"
# Node.js app
kubectl annotate deployment my-node-app \
instrumentation.opentelemetry.io/inject-nodejs="true"
Alert Channel Configuration
Slack Webhook
# Configure in Monoscope UI or API
# Settings -> Alert Channels -> Add Slack
# Provide webhook URL: https://hooks.slack.com/services/T.../B.../xxx
# Use slash command: /monoscope-here in channel
Discord Webhook
# Settings -> Alert Channels -> Add Discord
# Provide webhook URL: https://discord.com/api/webhooks/.../...
# Daily admin summary enabled automatically
# Settings -> Alert Channels -> Add PagerDuty
# Provide integration key from PagerDuty service
Querying TimeFusion
-- Connect to TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432
-- Recent errors
SELECT timestamp, name, duration, attributes___error___type
FROM otel_logs_and_spans
WHERE attributes___http___response___status_code LIKE '5%'
AND timestamp > NOW() - INTERVAL '1 hour'
ORDER BY timestamp DESC
LIMIT 50;
-- Top slowest endpoints (last 24 hours)
SELECT name, COUNT(*) as requests,
AVG(duration) / 1000000 as avg_ms,
MAX(duration) / 1000000 as max_ms
FROM otel_logs_and_spans
WHERE kind = 'SERVER'
AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 20;
Sources