Skip to content

How Grafana Works

Core Mechanism

Grafana is fundamentally a query, transform, and visualize engine. It does not store time-series data itself (except for configuration and alert state). Instead, it proxies queries to external data sources and renders the results in a browser.

Request Lifecycle

  1. User opens a dashboard → browser loads the dashboard JSON model
  2. Panel queries are dispatched → each panel sends its query to the Grafana backend
  3. Backend proxies to data source → Grafana translates the query and forwards it to the appropriate backend (Prometheus, Loki, SQL, etc.) using the configured data-source plugin
  4. Results are returned → data frames are sent back to the frontend
  5. Frontend renders → React-based panel plugins render the visualization
sequenceDiagram
    participant User as Browser
    participant GF as Grafana Server
    participant DS as Data Source<br/>(Prometheus, Loki, etc.)

    User->>GF: Open Dashboard
    GF-->>User: Dashboard JSON + Panel Config
    User->>GF: Execute Panel Queries
    GF->>DS: Proxy Query (PromQL, LogQL, SQL...)
    DS-->>GF: Data Frames / Results
    GF-->>User: Transformed Data
    User->>User: Render Visualization

Data Frames

Grafana uses a unified data abstraction called Data Frames — typed, columnar data structures (similar to Pandas DataFrames) that all data-source plugins must return. This abstraction lets any panel plugin render data from any source without tight coupling.

The LGTM Stack

The Grafana ecosystem addresses all pillars of observability through purpose-built backends:

Signal Backend Query Language Storage
Metrics Grafana Mimir PromQL Object Storage (S3/GCS/Azure)
Logs Grafana Loki LogQL Object Storage
Traces Grafana Tempo TraceQL Object Storage (Parquet columnar)
Profiles Grafana Pyroscope FlameQL Object Storage
Collection Grafana Alloy HCL-based config (River) N/A (pipeline agent)

Cross-Signal Correlation

The true power of the LGTM stack is cross-signal linking:

  • Exemplars: Metric data points carry trace IDs → click a spike in Mimir and jump to the exact trace in Tempo
  • Trace-to-Logs: A trace span carries labels that map to Loki log streams → jump from trace to logs
  • Derived Fields: Loki logs are parsed for trace IDs → jump from logs back to traces
  • Profiles: Pyroscope profiles are linked via labels to traces and metrics
flowchart TB
    subgraph Collection["Grafana Alloy (OTel Collector)"]
        direction LR
        R[Receivers<br/>OTLP, Prometheus, Syslog]
        P[Processors<br/>Batch, Filter, Transform]
        E[Exporters<br/>Remote Write, OTLP]
        R --> P --> E
    end

    subgraph Backends["LGTM Backends"]
        Mimir["Mimir<br/>(Metrics)"]
        Loki["Loki<br/>(Logs)"]
        Tempo["Tempo<br/>(Traces)"]
        Pyroscope["Pyroscope<br/>(Profiles)"]
    end

    subgraph Storage["Object Storage"]
        S3["S3 / GCS / Azure Blob"]
    end

    Collection -->|remote_write| Mimir
    Collection -->|push| Loki
    Collection -->|OTLP| Tempo
    Collection -->|push| Pyroscope

    Mimir --> S3
    Loki --> S3
    Tempo --> S3
    Pyroscope --> S3

    subgraph Grafana["Grafana UI"]
        Dash[Dashboards]
        Explore[Explore]
        Alert[Alerting]
    end

    Mimir -.->|PromQL| Grafana
    Loki -.->|LogQL| Grafana
    Tempo -.->|TraceQL| Grafana
    Pyroscope -.->|FlameQL| Grafana

    style Collection fill:#2a2d3e,color:#fff
    style Backends fill:#1a1d2e,color:#fff
    style Storage fill:#0d1117,color:#fff
    style Grafana fill:#ff6600,color:#fff

Plugin Architecture

Grafana's extensibility is built on a modular plugin system:

Plugin Types

Type Purpose Example
Data Source Connect to external data backends Prometheus, MySQL, Elasticsearch
Panel Custom visualization types Time series, Stat, Geomap, Flame graph
App Bundles of datasources + panels + pages Grafana Incident, Grafana OnCall
Renderer Server-side image/PDF rendering grafana-image-renderer

Plugin Lifecycle

  1. Discovery — Grafana scans the plugin directory on startup
  2. Bootstrap — reads plugin.json metadata (ID, type, dependencies)
  3. Validation — checks plugin signature (signed/unsigned/private)
  4. Initialization — loads frontend (React) and backend (Go via gRPC subprocess)

Frontend ↔ Backend Communication

Backend plugins run as separate processes and communicate with the Grafana server via gRPC. This isolation means:

  • A crashing plugin does not crash Grafana
  • Plugins can implement custom auth, caching, and alerting
  • Sensitive operations (secrets, credentials) stay server-side

Key SDK packages: - @grafana/data — data structures, plugin base classes - @grafana/ui — reusable React UI components (Grafana design system) - @grafana/runtime — runtime services (data fetching, config) - Grafana Plugin SDK for Go — server-side plugin development in Go

Alerting Pipeline (Unified Alerting)

Since Grafana 9+, alerting uses a unified architecture that works across all data sources:

flowchart LR
    subgraph Rules["Alert Rules"]
        R1["Rule 1<br/>PromQL: cpu > 80%"]
        R2["Rule 2<br/>LogQL: error rate"]
    end

    subgraph Eval["Rule Evaluator"]
        E["Periodic Evaluation<br/>(every N seconds)"]
    end

    subgraph State["Alert State Manager"]
        S["Normal → Pending → Alerting"]
    end

    subgraph NP["Notification Policies"]
        Tree["Routing Tree<br/>(label matchers)"]
    end

    subgraph CP["Contact Points"]
        Slack[Slack]
        PD[PagerDuty]
        Email[Email]
        WH[Webhook]
    end

    Rules --> Eval --> State --> NP
    NP -->|severity=critical| PD
    NP -->|team=backend| Slack
    NP -->|default| Email
    NP -->|custom| WH

    style Rules fill:#1a1d2e,color:#fff
    style Eval fill:#2a2d3e,color:#fff
    style State fill:#2a2d3e,color:#fff
    style NP fill:#ff6600,color:#fff
    style CP fill:#0d7377,color:#fff

Key Concepts

  • Alert Rules define what to evaluate and the threshold conditions
  • Labels on alert instances drive routing (e.g., severity=critical, team=infra)
  • Notification Policies form a routing tree — each policy matches labels and routes to contact points
  • Contact Points define destinations (Slack, PagerDuty, Email, Webhook, OpsGenie, etc.)
  • Mute Timings suppress alerts during maintenance windows
  • Silences temporarily suppress specific alert instances during incidents

Data Flow

Grafana Alloy Pipeline

Grafana Alloy (successor to Grafana Agent) is the recommended telemetry collection agent:

  • Two configuration modes:
  • Default Engine (Alloy/River syntax) — HCL-based, component-oriented, supports clustering and debug UI
  • OpenTelemetry Engine — standard YAML OTel Collector config for portability

  • Pipeline stages: Receivers → Processors → Exporters

  • Debug UI available at http://localhost:12345 for real-time pipeline inspection

Grafana Mimir (Metrics)

  1. Distributor receives remote-write from Prometheus/Alloy → validates, shards by tenant
  2. Ingester writes to in-memory TSDB + WAL → flushes 2-hour blocks to object storage
  3. Querier executes PromQL across ingesters (recent) and store-gateways (historical)
  4. Compactor merges and deduplicates blocks in object storage
  5. Store-Gateway indexes object storage blocks for fast historical queries

Grafana Loki (Logs)

  1. Distributor receives log streams from Alloy → routes by label hash
  2. Ingester compresses logs into chunks, indexes labels only (not full text)
  3. Querier executes LogQL across ingesters and object storage
  4. Compactor merges index files and enforces retention

Key insight: Loki does not index log content — only metadata labels. This dramatically reduces storage costs but requires queries to start with a label selector.

Grafana Tempo (Traces)

  1. Distributor receives traces (OTLP, Jaeger, Zipkin) → routes by trace ID hash
  2. Ingester organizes spans into Apache Parquet columns, creates bloom filters, flushes blocks
  3. Querier searches by trace ID or uses TraceQL for attribute-based search
  4. Metrics-Generator (optional) extracts RED metrics (Rate, Errors, Duration) from spans → pushes to Mimir

Key insight: Tempo uses no index — it relies on object storage + Parquet columnar format + bloom filters, making it extremely cheap to operate at scale.

Lifecycle

Grafana Server Lifecycle

  1. Startup — loads grafana.ini config, runs database migrations, discovers plugins
  2. Runtime — serves HTTP/HTTPS, processes API requests, evaluates alert rules, manages sessions
  3. Shutdown — graceful drain of connections, flushes pending alert state

Dashboard Lifecycle

  1. Created (UI or provisioning) → stored as JSON in the Grafana database
  2. Versioned — each save creates a new version (built-in version history)
  3. Provisioned (optional) — dashboards loaded from YAML/JSON files on disk, watched for changes every 10s
  4. Exported — dashboards can be exported as JSON for sharing or IaC