Operations¶

Deployment & Typical Setup¶

Single-Node (Simplest Production Path)¶

# VictoriaMetrics — single binary, metrics
./victoria-metrics -storageDataPath=/data/vm -retentionPeriod=12

# VictoriaLogs — single binary, logs
./victoria-logs -storageDataPath=/data/vl -retentionPeriod=30d

# VictoriaTraces — single binary, traces
./victoria-traces -storageDataPath=/data/vt

Each binary starts an HTTP server and is immediately ready to receive data. No configuration files needed for basic usage.

Kubernetes (vmoperator)¶

The recommended production path uses the vmoperator with CRDs:

# Install the operator
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update
helm install vmoperator vm/victoria-metrics-operator -n monitoring --create-namespace

# Deploy cluster via CRD
kubectl apply -f - <<EOF
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: vm-cluster
spec:
  retentionPeriod: "12"
  replicationFactor: 2
  vminsert:
    replicaCount: 2
    resources:
      requests: { cpu: "500m", memory: "512Mi" }
  vmselect:
    replicaCount: 2
    resources:
      requests: { cpu: "500m", memory: "1Gi" }
  vmstorage:
    replicaCount: 3
    storageDataPath: /vm-data
    resources:
      requests: { cpu: "1", memory: "4Gi" }
    storage:
      volumeClaimTemplate:
        spec:
          resources:
            requests: { storage: 100Gi }
          storageClassName: fast-ssd
EOF

Production Readiness Checklist¶

Configuration & Optimal Tuning¶

vmauth Routing Configuration¶

The single most important config file — routes traffic across all three databases:

# vmauth-config.yaml
unauthorized_user:
  url_map:
    # === METRICS ===
    - src_paths:
        - "/api/v1/write"
        - "/api/v1/import.*"
      url_prefix: "http://vminsert:8480/insert/0/prometheus"
    - src_paths:
        - "/api/v1/query.*"
        - "/api/v1/series.*"
        - "/api/v1/labels.*"
      url_prefix: "http://vmselect:8481/select/0/prometheus"

    # === LOGS ===
    - src_paths:
        - "/insert/jsonline.*"
        - "/insert/elasticsearch.*"
        - "/loki/api/v1/push"
      url_prefix: "http://victorialogs:9428"
    - src_paths:
        - "/select/logsql/.*"
      url_prefix: "http://victorialogs:9428"

    # === TRACES ===
    - src_paths:
        - "/insert/opentelemetry/.*"
      url_prefix: "http://victoriatraces:10428"
    - src_paths:
        - "/api/traces.*"
        - "/api/services.*"
      url_prefix: "http://victoriatraces:10428"

Critical Tuning Flags¶

Component	Flag	Purpose	Default
All	`-retentionPeriod`	Data retention duration	1 month
vmstorage	`-search.maxUniqueTimeseries`	Prevent OOM on high-cardinality queries	300,000
vmstorage	`-memory.allowedPercent`	Max RAM usage percent before aggressive GC	60%
vmstorage	`-search.maxQueryDuration`	Max single query execution time	30s
vminsert	`-replicationFactor=N`	Replicate data to N storage nodes	1
vmselect	`-dedup.minScrapeInterval`	Deduplicate data when RF > 1	0s
vmagent	`-remoteWrite.label`	Add global labels to all scraped metrics	—
VictoriaLogs	`-retentionPeriod`	Log retention	7d

Reliability & Scaling¶

Scaling Decision Matrix¶

Symptom	Component to Scale	How
Slow metric queries	vmselect	Add replicas
Write backpressure	vminsert	Add replicas
Disk full on metrics	vmstorage	Add nodes or increase disk
High RAM on storage	vmstorage	Increase `-memory.allowedPercent`, reduce cardinality
Slow log search	VictoriaLogs	Add CPU/RAM (single-node) or cluster
Log ingestion lag	VictoriaLogs	Increase resources or switch to cluster

High Availability¶

Mechanism	Implementation
Metrics replication	`-replicationFactor=2` on vminsert + `-dedup.minScrapeInterval` on vmselect
Metrics availability	If 1 vmstorage fails with RF=2, vmselect returns partial results transparently
Logs/Traces HA	Deploy cluster mode with vlinsert/vlstorage/vlselect
Proxy HA	Multiple vmauth replicas behind load balancer
Backup	vmbackup creates instant, consistent snapshots without locking the DB

Cost¶

Cost Drivers¶

Factor	Driver	Optimization
Compute	Insert + select pods	Right-size, use spot nodes for vmselect
Storage	Data volume × retention	ZSTD compression reduces 2–7x naturally, tune retention
Network	Internal cluster traffic	Co-locate in same AZ
NO object storage	Local SSD only	Eliminates S3/GCS egress costs entirely

Cost at Scale (Self-Hosted)¶

Scale	Active Series	Logs (GB/day)	Estimated Monthly
Small	100k	10	$100–300
Medium	1M	100	$500–1,500
Large	10M	1 TB	$2,000–8,000
Enterprise	100M+	10 TB+	$10,000–50,000

VictoriaMetrics Cloud Pricing¶

Tier	Starting Cost	Includes
Single-node	~$225/mo	Up to 500k active series, 1-month retention
Cluster	~$1,300/mo	Multi-tenancy, HA, advanced networking

Security¶

Authentication & Authorization¶

The databases themselves do not implement RBAC natively.
Security relies strictly on vmauth, which acts as the gatekeeper:
Bearer token authentication
Basic auth
URL-based access control
Header manipulation
Enterprise: SSO integration in vmauth

Network Security Best Practices¶

Never expose ingestion nodes to the internet — always put vmauth or NGINX in front
Use Kubernetes NetworkPolicies to restrict pod-to-pod communication
Only vmauth should be externally accessible
Use mTLS between components in sensitive environments
Cluster multi-tenancy: Data isolation via account IDs in URL paths (/insert/TENANT_ID/)

Best Practices¶

Metrics¶

Global Relabeling: Append datacenter/environment labels at the vmagent layer before data hits storage
Drop high-cardinality labels: Use vmagent relabeling to drop labels like pod_ip, request_id before ingestion
Recording rules: Precompute expensive MetricsQL expressions via vmalert
Deduplication: With replication, always set -dedup.minScrapeInterval on vmselect

Logs¶

Avoid Translation: Use native APIs whenever possible — point Fluent Bit directly to /insert/jsonline rather than going through an intermediary
Structured logging: Use JSON logs to enable field extraction at query time
Stream fields: Set _stream_fields on ingestion to logically group related log entries
Retention per signal: Set different retention periods for logs (30d) vs metrics (12mo) vs traces (14d)

Operations¶

Monitor with itself: Scrape VictoriaMetrics' own /metrics endpoint
Use vmbackup regularly: Schedule daily incremental backups to S3
Test upgrades on LTS: Use the LTS release line for production stability

Common Issues & Playbook¶

Symptom	Likely Cause	Fix
High CPU on vmstorage during queries	Large time-window queries	Limit `-search.maxQueryDuration`, scale vmselect
OOM on vmstorage	High cardinality churn	Tune `-memory.allowedPercent`, drop unused labels at vmagent
"too many unique timeseries"	Query returns too many series	Increase `-search.maxUniqueTimeseries` or refine query
Slow VictoriaLogs queries	Large time range without filters	Add time restrictions (`_time:1h`), use specific filters
vmagent not discovering targets	ServiceMonitor/PodScrape CRDs not picked up	Verify vmoperator is running, check CRD labels
VictoriaTraces not receiving spans	OTLP gRPC not enabled	Explicitly enable gRPC port in config
Data gap after vmstorage restart	WAL not flushed	Normal — WAL replays on restart, gap is temporary

Monitoring & Troubleshooting¶

Key Self-Monitoring Metrics¶

Metric	What It Tells You
`vm_rows_inserted_total`	Ingestion throughput
`vm_active_timeseries`	Current cardinality
`vm_slow_queries_total`	Queries exceeding duration threshold
`vm_cache_entries`	Cache utilization
`vm_data_size_bytes`	On-disk data size
`process_resident_memory_bytes`	Actual RAM usage
`vm_merge_duration_seconds`	Background compaction health

Commands & Recipes¶

Installation¶

Docker (Quick Start — All Components)¶

# VictoriaMetrics (metrics)
docker run -d --name vm \
  -p 8428:8428 \
  -v vm-data:/storage \
  victoriametrics/victoria-metrics \
  -storageDataPath=/storage -retentionPeriod=12

# VictoriaLogs (logs)
docker run -d --name vl \
  -p 9428:9428 \
  -v vl-data:/vlogs \
  victoriametrics/victoria-logs \
  -storageDataPath=/vlogs -retentionPeriod=30d

# VictoriaTraces (traces)
docker run -d --name vt \
  -p 10428:10428 \
  -p 4317:4317 \
  -v vt-data:/vtraces \
  victoriametrics/victoria-traces \
  -storageDataPath=/vtraces

Docker Compose (Full Stack)¶

# docker-compose.yaml — Full Victoria stack for development
version: '3.8'
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:latest
    ports: ["8428:8428"]
    volumes: ["vm-data:/storage"]
    command:
      - "-storageDataPath=/storage"
      - "-retentionPeriod=12"

  victorialogs:
    image: victoriametrics/victoria-logs:latest
    ports: ["9428:9428"]
    volumes: ["vl-data:/vlogs"]
    command:
      - "-storageDataPath=/vlogs"
      - "-retentionPeriod=30d"

  victoriatraces:
    image: victoriametrics/victoria-traces:latest
    ports:
      - "10428:10428"  # HTTP
      - "4317:4317"    # OTLP gRPC
    volumes: ["vt-data:/vtraces"]
    command:
      - "-storageDataPath=/vtraces"

  vmagent:
    image: victoriametrics/vmagent:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - "-promscrape.config=/etc/prometheus/prometheus.yml"
      - "-remoteWrite.url=http://victoriametrics:8428/api/v1/write"

  vmauth:
    image: victoriametrics/vmauth:latest
    ports: ["8427:8427"]
    volumes:
      - ./vmauth-config.yml:/etc/vmauth/config.yml
    command:
      - "-auth.config=/etc/vmauth/config.yml"

  vmalert:
    image: victoriametrics/vmalert:latest
    volumes:
      - ./alert-rules.yml:/etc/rules/rules.yml
    command:
      - "-rule=/etc/rules/*.yml"
      - "-datasource.url=http://victoriametrics:8428"
      - "-remoteWrite.url=http://victoriametrics:8428"

  grafana:
    image: grafana/grafana-oss:latest
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  vm-data:
  vl-data:
  vt-data:

Helm (Kubernetes)¶

helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update

# Single-node VictoriaMetrics
helm install vm vm/victoria-metrics-single -n monitoring --create-namespace

# Cluster VictoriaMetrics
helm install vm-cluster vm/victoria-metrics-cluster -n monitoring -f vm-values.yaml

# vmoperator (manages all components via CRDs)
helm install vmoperator vm/victoria-metrics-operator -n monitoring

# vmagent
helm install vmagent vm/victoria-metrics-agent -n monitoring

# vmalert
helm install vmalert vm/victoria-metrics-alert -n monitoring

# VictoriaLogs (single-node)
helm install vl vm/victoria-logs-single -n monitoring

vmagent Recipes¶

# Start vmagent as drop-in Prometheus replacement
./vmagent \
  -promscrape.config=/path/to/prometheus.yml \
  -remoteWrite.url=http://victoriametrics:8428/api/v1/write

# Add global labels to all scraped metrics
./vmagent \
  -remoteWrite.label=datacenter=us-east-1 \
  -remoteWrite.label=env=production \
  -promscrape.config=prometheus.yml \
  -remoteWrite.url=http://vminsert:8480/insert/0/prometheus/api/v1/write

# Multi-destination remote write (fan-out)
./vmagent \
  -remoteWrite.url=http://vm-primary:8428/api/v1/write \
  -remoteWrite.url=http://vm-secondary:8428/api/v1/write

Data Ingestion Recipes¶

Fluent Bit → VictoriaLogs¶

# fluent-bit.conf — Push logs directly to VictoriaLogs
[OUTPUT]
    Name  http
    Match *
    Host  victorialogs
    Port  9428
    URI   /insert/jsonline?_stream_fields=stream&_msg_field=log&_time_field=date
    Format json_lines
    Compress gzip

OpenTelemetry Collector → VictoriaTraces¶

# otel-collector-config.yaml
exporters:
  otlp/victoriatraces:
    endpoint: "victoriatraces:4317"
    tls:
      insecure: true

  prometheusremotewrite/vm:
    endpoint: "http://victoriametrics:8428/api/v1/write"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/victoriatraces]
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch]
      exporters: [prometheusremotewrite/vm]

Promtail / Loki Push → VictoriaLogs¶

# promtail-config.yaml — VictoriaLogs accepts Loki push API
clients:
  - url: http://victorialogs:9428/insert/loki/api/v1/push

Direct OTLP → VictoriaTraces¶

HTTP: http://victoriatraces:10428/insert/opentelemetry/v1/traces
gRPC: grpc://victoriatraces:4317

vmauth Routing Config¶

# vmauth-config.yml — Route all signals through one proxy
unauthorized_user:
  url_map:
    # Metrics write
    - src_paths: ["/api/v1/write", "/api/v1/import.*"]
      url_prefix: "http://vminsert:8480/insert/0/prometheus"

    # Metrics read
    - src_paths: ["/api/v1/query.*", "/api/v1/series.*", "/api/v1/labels.*"]
      url_prefix: "http://vmselect:8481/select/0/prometheus"

    # Logs write
    - src_paths: ["/insert/jsonline.*", "/insert/elasticsearch.*", "/loki/api/v1/push"]
      url_prefix: "http://victorialogs:9428"

    # Logs read
    - src_paths: ["/select/logsql/.*"]
      url_prefix: "http://victorialogs:9428"

    # Traces write
    - src_paths: ["/insert/opentelemetry/.*"]
      url_prefix: "http://victoriatraces:10428"

    # Traces read (Jaeger API)
    - src_paths: ["/api/traces.*", "/api/services.*"]
      url_prefix: "http://victoriatraces:10428"

Backup & Restore¶

# Create instant snapshot (single-node)
curl http://victoriametrics:8428/snapshot/create
# Returns: {"status":"ok","snapshot":"20260410120000-..."}

# Backup snapshot to S3
./vmbackup \
  -storageDataPath=/data/vm \
  -snapshot.createURL=http://localhost:8428/snapshot/create \
  -dst=s3://my-bucket/vm-backups/

# Incremental backup (only new data since last backup)
./vmbackup \
  -storageDataPath=/data/vm \
  -snapshot.createURL=http://localhost:8428/snapshot/create \
  -dst=s3://my-bucket/vm-backups/ \
  -origin=s3://my-bucket/vm-backups/  # previous backup path

# Restore from backup
./vmrestore \
  -src=s3://my-bucket/vm-backups/latest \
  -storageDataPath=/data/vm-restored

Note: For clustered setup, vmbackup must be executed on EVERY vmstorage node.

API Recipes¶

# Query VictoriaMetrics (PromQL/MetricsQL)
curl -s "http://vm:8428/api/v1/query?query=up" | jq .

# Range query
curl -s "http://vm:8428/api/v1/query_range?query=rate(http_requests_total[5m])&start=-1h&step=60s" | jq .

# Import data via JSON
curl -d '{"metric":{"__name__":"test","job":"api"},"values":[1,2,3],"timestamps":[1617000000000,1617000001000,1617000002000]}' \
  http://vm:8428/api/v1/import

# Query VictoriaLogs (LogsQL)
curl -s "http://vl:9428/select/logsql/query?query=_time:5m+AND+error" | jq .

# Push a test log
curl -X POST "http://vl:9428/insert/jsonline?_stream_fields=app&_msg_field=msg" \
  -d '{"app":"test","msg":"hello from curl","level":"info"}'

# Look up a trace by ID (Jaeger API)
curl -s "http://vt:10428/api/traces/abc123" | jq .

# Check health
curl -s "http://vm:8428/-/healthy" && echo "OK"

Grafana Data Source Config¶

# Grafana provisioning for Victoria Stack
apiVersion: 1
datasources:
  - name: VictoriaMetrics
    type: prometheus
    url: http://vmauth:8427
    isDefault: true
    jsonData:
      httpMethod: POST

  - name: VictoriaLogs
    type: victoriametrics-logs-datasource
    url: http://vmauth:8427

  - name: VictoriaTraces
    type: jaeger
    url: http://vmauth:8427

Operations¶

Deployment & Typical Setup¶

Single-Node (Simplest Production Path)¶

Kubernetes (vmoperator)¶

Production Readiness Checklist¶

Configuration & Optimal Tuning¶

vmauth Routing Configuration¶

Critical Tuning Flags¶

Reliability & Scaling¶

Scaling Decision Matrix¶

High Availability¶

Cost¶

Cost Drivers¶

Cost at Scale (Self-Hosted)¶

VictoriaMetrics Cloud Pricing¶

Security¶

Authentication & Authorization¶

Network Security Best Practices¶

Best Practices¶

Metrics¶

Logs¶

Operations¶

Common Issues & Playbook¶

Monitoring & Troubleshooting¶

Key Self-Monitoring Metrics¶

Related Notes¶

Commands & Recipes¶

Installation¶

Docker (Quick Start — All Components)¶

Docker Compose (Full Stack)¶

Helm (Kubernetes)¶

vmagent Recipes¶

Data Ingestion Recipes¶

Fluent Bit → VictoriaLogs¶

OpenTelemetry Collector → VictoriaTraces¶

Promtail / Loki Push → VictoriaLogs¶

Direct OTLP → VictoriaTraces¶

vmauth Routing Config¶

Backup & Restore¶

API Recipes¶

Grafana Data Source Config¶

Related Notes¶