Skip to content

Architecture

Deployment Modes

Single-Node vs Cluster

Feature Single-Node Cluster
Scalability Vertical only Horizontal & Vertical
Operational Complexity Very Low (1 binary) Moderate (3 component types)
Multi-tenancy No Yes (via account IDs)
Replication No (relies on durable disk) Yes (-replicationFactor=N)
Target Workload Up to ~1M samples/sec Billions of series, 100M+ samples/sec
External Dependencies None None

Recommendation: Start with single-node. Only move to cluster when you need multi-tenancy, horizontal scaling beyond a single machine, or application-level replication.

Component Roles

Each signal type follows the same tri-component pattern for cluster mode:

Component Role Metrics Logs Traces
Ingestion vminsert vlinsert vtinsert
Querying vmselect vlselect vtselect
Storage vmstorage vlstorage vtstorage

All three types are stateless (insert/select) or stateful (storage), and can be scaled independently.

Full Stack Architecture

flowchart TB
    subgraph Sources["Data Sources"]
        K8s["Kubernetes<br/>Pods & Services"]
        Apps["Applications<br/>(OTel SDK)"]
        Infra["Infrastructure<br/>(node_exporter, etc.)"]
        Logs["Log Sources<br/>(Fluentbit, Logstash)"]
    end

    subgraph Collection["Collection Layer"]
        Agent["vmagent<br/>(DaemonSet)<br/>Scrape + Push"]
        OTel["OTel Collector<br/>(optional)"]
    end

    subgraph Proxy["Routing Layer"]
        Auth["vmauth<br/>Auth · Route · LB"]
    end

    subgraph MetricsCluster["VictoriaMetrics (Metrics)"]
        MI["vminsert ×2"]
        MS["vmstorage ×3<br/>(StatefulSet, SSD)"]
        MSel["vmselect ×2"]
        MI --> MS
        MSel --> MS
    end

    subgraph LogsCluster["VictoriaLogs (Logs)"]
        LI["vlinsert ×2"]
        LS["vlstorage ×3<br/>(StatefulSet, SSD)"]
        LSel["vlselect ×2"]
        LI --> LS
        LSel --> LS
    end

    subgraph TracesCluster["VictoriaTraces (Traces)"]
        TI["vtinsert ×2"]
        TS["vtstorage ×3<br/>(StatefulSet, SSD)"]
        TSel["vtselect ×2"]
        TI --> TS
        TSel --> TS
    end

    subgraph Alerting["Alerting"]
        Alert["vmalert"]
        AM["Alertmanager"]
    end

    subgraph Viz["Visualization"]
        Grafana["Grafana"]
        VMUI["VMUI<br/>(built-in)"]
    end

    Sources --> Collection
    Collection --> Auth
    Logs --> Auth

    Auth -->|"Metrics: /api/v1/write"| MI
    Auth -->|"Logs: /insert/jsonline"| LI
    Auth -->|"Traces: /insert/opentelemetry"| TI

    Auth -->|"PromQL query"| MSel
    Auth -->|"LogsQL query"| LSel
    Auth -->|"Jaeger query"| TSel

    Grafana --> Auth
    VMUI --> Auth
    Alert --> Auth
    Alert -->|"Fire alerts"| AM

    style Sources fill:#0d7377,color:#fff
    style Collection fill:#ff6600,color:#fff
    style Proxy fill:#7b42bc,color:#fff
    style MetricsCluster fill:#2a2d3e,color:#fff
    style LogsCluster fill:#2a7de1,color:#fff
    style TracesCluster fill:#e65100,color:#fff
    style Alerting fill:#c62828,color:#fff
    style Viz fill:#ff6600,color:#fff

vmalert Evaluation Flow

sequenceDiagram
    participant A as vmalert
    participant P as vmauth (Proxy)
    participant VM as VictoriaMetrics / Logs
    participant AM as Alertmanager

    Note over A: Evaluate Rules (periodic)
    A->>P: POST /api/v1/query (Query Request)
    P->>VM: Inspect path & Forward to backend
    VM-->>P: Return Query Results
    P-->>A: Return Query Results

    alt Alert Triggered
        A->>AM: Send Alert Notification
    else Recording Rule
        A->>VM: Remote Write Results
    end

Multi-Source Log Ingestion

VictoriaLogs accepts logs from virtually any source without translation:

flowchart LR
    A["Promtail"] -->|"Loki Push API"| B{"vmauth"}
    C["Fluent Bit"] -->|"JSON Lines"| B
    D["Logstash"] -->|"ES Bulk API"| B
    E["OTel Collector"] -->|"OTLP"| B
    F["rsyslog"] -->|"Syslog"| B
    B -->|"Route & Auth"| G["vlinsert"]
    G --> H[("vlstorage")]

    style B fill:#7b42bc,color:#fff
    style H fill:#2a7de1,color:#fff

Storage Layout

VictoriaMetrics (Metrics)

/path/to/vmstorage/data/
├── big/                    # Large, compacted data blocks
│   ├── YYYY_MM/           # Monthly partitions
│   │   ├── parts/         # Compressed TSDB blocks
│   │   └── tmp/           # Temporary merge workspace
├── small/                  # Recently ingested, small blocks
│   └── YYYY_MM/
├── indexdb/               # Inverted index (label → series ID)
└── snapshots/             # Point-in-time snapshots (for vmbackup)

VictoriaLogs (Logs)

/path/to/vlstorage/data/
├── YYYYMMDD/              # Daily partitions
│   ├── bloom_filters/     # Bloom filters for word matching
│   ├── columns/           # Columnar storage (msg, timestamp, labels)
│   └── metadata/

VictoriaTraces (Traces)

Uses the same storage engine as VictoriaLogs (daily partitions, bloom filters, columnar format) but organizes data by trace ID and span attributes.

Kubernetes Deployment Matrix

Component Kind Replicas (Min HA) Key Resource Helm Chart
vmagent DaemonSet or Deployment 1 per node (DS) or 2+ CPU, Memory victoria-metrics-agent
vmauth Deployment 2+ CPU victoria-metrics-auth
vminsert Deployment 2+ CPU victoria-metrics-cluster
vmselect Deployment 2+ CPU, Memory victoria-metrics-cluster
vmstorage StatefulSet 3+ Disk IOPS, Memory victoria-metrics-cluster
VictoriaLogs StatefulSet (single-node) or cluster 1–3 Disk, Memory victoria-logs-single
VictoriaTraces StatefulSet (single-node) 1 Disk, Memory
vmalert Deployment 1–2 CPU victoria-metrics-alert
vmoperator Deployment 1 CPU victoria-metrics-operator

Key Design Decisions

Decision Rationale
No external dependencies No PostgreSQL, Redis, ZooKeeper, or object storage required — reduces operational surface
Local disk > Object storage SSDs provide lower latency than S3; compression compensates for limited capacity
Shared-nothing cluster vmstorage nodes don't communicate — each owns its shard, simplifying scaling
Consistent hashing vminsert distributes data deterministically without consensus protocol overhead
Bloom filters (VictoriaLogs) Dramatically less RAM than inverted indexes at the cost of slightly higher scan overhead
Apache 2.0 license More permissive than AGPL — no copyleft obligations for SaaS usage