Operations
Deployment, configuration, scaling, monitoring, and day-2 operations for Monoscope.
Deployment
Docker Compose (Quick Start)
git clone https://github.com/monoscope-tech/monoscope.git
cd monoscope
docker-compose up -d
# Visit http://localhost:8080 (default: admin/changeme)
Docker Compose (Production)
services:
monoscope:
image: monoscope/monoscope:v0.5.0
ports:
- "8080:8080"
- "4317:4317" # OTLP gRPC
environment:
- DATABASE_URL=postgresql://monoscope:password@postgres:5432/monoscope
- S3_BUCKET=your-telemetry-bucket
- AWS_REGION=us-east-1
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
- KAFKA_BROKERS=kafka:9092
depends_on:
postgres:
condition: service_healthy
kafka:
condition: service_started
postgres:
image: timescale/timescaledb:latest-pg18
environment:
POSTGRES_USER: monoscope
POSTGRES_PASSWORD: password
POSTGRES_DB: monoscope
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U monoscope"]
interval: 5s
timeout: 5s
retries: 5
kafka:
image: confluentinc/cp-kafka:latest
environment:
KAFKA_NODE_ID: 1
KAFKA_LISTENERS: PLAINTEXT://:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_PROCESS_ROLES: broker,controller
CLUSTER_ID: monoscope-cluster
volumes:
- kafkadata:/var/lib/kafka/data
volumes:
pgdata:
kafkadata:
Kubernetes with OTel Operator
# Install OTel Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
# Create Instrumentation CRD pointing at Monoscope
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1beta1
kind: Instrumentation
metadata:
name: monoscope-instrumentation
namespace: observability
spec:
exporter:
endpoint: http://monoscope:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_always_on
EOF
# Annotate deployments for auto-instrumentation
kubectl annotate deployment my-app \
instrumentation.opentelemetry.io/inject-java="true" \
--namespace default
Configuration
Key Environment Variables
| Variable |
Required |
Description |
DATABASE_URL |
Yes |
PostgreSQL connection string for metadata |
S3_BUCKET |
Yes |
S3-compatible bucket for telemetry data |
AWS_REGION |
Yes |
S3 region |
AWS_ACCESS_KEY_ID |
Yes |
S3 access key |
AWS_SECRET_ACCESS_KEY |
Yes |
S3 secret key |
KAFKA_BROKERS |
Yes |
Kafka bootstrap servers |
OTLP_PORT |
No |
OTLP gRPC port (default: 4317) |
HTTP_PORT |
No |
Web UI port (default: 8080) |
TimeFusion Standalone Configuration
docker run -d \
-p 5432:5432 \
-e AWS_S3_BUCKET=your-bucket \
-e AWS_ACCESS_KEY_ID=your-key \
-e AWS_SECRET_ACCESS_KEY=your-secret \
timefusion/timefusion:latest
# Connect with any PostgreSQL client on port 5432
TimeFusion Cache Tuning
| Parameter |
Default |
Description |
| Memory cache |
512MB |
In-memory Foyer adaptive cache |
| Disk cache |
100GB |
On-disk cache for warm data |
| TTL |
7 days |
Cache entry time-to-live |
| Hit rate target |
95%+ |
Expected for hot data queries |
Scaling
Vertical Scaling
- Monoscope API: Increase CPU/memory for higher ingestion throughput
- TimeFusion: Increase cache sizes (memory + disk) for faster queries
- PostgreSQL: Scale TimescaleDB for metadata operations
Horizontal Scaling
| Component |
Strategy |
| Monoscope API |
Run multiple pods behind load balancer (stateless) |
| TimeFusion |
Multi-instance with DynamoDB distributed locking |
| Kafka |
Partition-based horizontal scaling |
| PostgreSQL |
Read replicas for query load |
Storage Scaling
- S3 provides effectively unlimited storage
- Delta Lake compaction keeps query performance stable
- Zstandard compression reduces storage 10-20x
- No data sampling means costs scale linearly with ingest volume
Monitoring
Health Checks
# Check Monoscope API
curl -sf http://localhost:8080/healthz
# Check TimeFusion via PostgreSQL wire protocol
psql -h localhost -p 5432 -c "SELECT 1"
# Check OTLP ingestion endpoint
grpcurl -plaintext localhost:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export
Key Metrics to Watch
| Metric |
Alert Threshold |
Description |
| OTLP ingestion rate |
Drop > 50% |
Pipeline health |
| Kafka consumer lag |
> 10000 messages |
Extraction worker backlog |
| S3 write latency |
> 500ms |
Storage bottleneck |
| TimeFusion query latency (p99) |
> 2s |
Query performance |
| Cache hit rate |
< 80% |
Cache effectiveness |
| PostgreSQL connections |
> 80% max |
Connection pool saturation |
Upgrades
# Pull latest image
docker pull monoscope/monoscope:v0.5.0
# Restart with new version (migrations baked into Docker image)
docker-compose down
docker-compose up -d
# Verify
curl -sf http://localhost:8080/healthz
Migrations are run automatically on startup. The Docker image includes all PLpgSQL migrations (87KB of SQL).
Backup & Recovery
S3 Data
- Delta Lake provides ACID transactions and time travel
- Enable S3 versioning for additional protection
- Cross-region replication for disaster recovery
PostgreSQL Metadata
pg_dump -h postgres-host -U monoscope -d monoscope > monoscope-metadata-backup.sql
Recovery
- Restore PostgreSQL from backup
- S3 data is durable by default (no recovery needed)
- Monoscope replays any missed events from Kafka on restart
Common Issues
| Issue |
Cause |
Resolution |
| OTLP connection refused |
Wrong port or missing Bearer token |
Verify port 4317 and API key in OTel Collector config |
| Empty dashboards |
TimeFusion not connected to S3 |
Check S3 credentials and bucket access |
| High query latency |
Cache miss rate high |
Increase memory/disk cache sizes |
| Kafka consumer lag |
Extraction worker under-resourced |
Scale worker instances or increase Kafka partitions |
| Session replay gaps |
Browser SDK not capturing |
Verify session replay SDK is initialized correctly |
Sources