Grafana¶
Home | Knowledge Hub | Projects Hub
Summary¶
Grafana is the industry-standard open-source platform for observability and data visualization. Originally created by Torkel Ödegaard in 2014 as a fork of Kibana, it has evolved from a pure dashboarding tool into the centerpiece of a composable, full-stack observability ecosystem — the LGTM stack (Loki, Grafana, Tempo, Mimir) — with an optional fifth pillar, Pyroscope for continuous profiling.
Grafana connects to virtually any data source through its extensible plugin architecture and provides a single pane of glass for metrics, logs, traces, and profiles. It is built with Go (backend) and TypeScript/React (frontend), licensed under AGPL-3.0, and backed by Grafana Labs, a $6B+ valued company (as of 2024).
| Attribute | Detail |
|---|---|
| Repository | github.com/grafana/grafana |
| Stars / Forks | 73.1k ⭐ / 13.7k 🍴 |
| Commits / Releases | 68,000+ commits / 606 releases |
| Latest Version | v12.4.2 (March 2026) |
| Languages | TypeScript (50.3%), Go (43.0%) |
| License | AGPL-3.0 (open-source; plugins mostly Apache 2.0) |
| Founded | 2014 by Torkel Ödegaard |
| Company | Grafana Labs (est. 2014, HQ: New York) |
Evaluation¶
-
Why it's better: Vendor-neutral, open-standards-first (Prometheus, OpenTelemetry), extremely extensible plugin ecosystem (100+ data sources), and can unify all telemetry signals (metrics, logs, traces, profiles) in a single interface. No other tool matches its breadth of data-source support combined with open-source availability.
-
When it fits (Applicability): Any team or organization that needs to visualize, correlate, and alert on data from heterogeneous sources. It shines when: you want to avoid vendor lock-in, you use Prometheus/OTel, you need custom dashboards, or you want one tool that connects infra, apps, and business metrics.
-
Pros and Cons:
| Pros | Cons |
|---|---|
| Unmatched data-source flexibility | Higher operational overhead when self-hosting the full LGTM stack |
| World-class visualization engine | AGPL-3.0 may create compliance friction for some orgs |
| Massive community and plugin catalog | Unified alerting is powerful but has a steep learning curve |
| Open standards (Prometheus, OTel native) | Default SQLite backend limits single-node scale |
| Free tier in Grafana Cloud is generous | Enterprise features (fine-grained RBAC, SAML) require paid tier |
| Cross-signal correlation (metrics ↔ logs ↔ traces) | Dashboard sprawl is a real governance challenge at scale |
- Common Use Cases:
- Infrastructure monitoring (Kubernetes, VMs, networks)
- Application Performance Monitoring (APM) via Tempo/OTel
- Log aggregation and exploration via Loki
- Business metrics dashboards (SQL, Elasticsearch, BigQuery)
- IoT and industrial telemetry visualization
- Security dashboards (with Elasticsearch/Loki SIEM patterns)
-
AI/ML pipeline observability (emerging, 2025+)
-
Licensing & Commercial Use:
- Core Grafana: AGPL-3.0 (since April 2021; was Apache 2.0 before)
- Plugins, agents, SDKs: mostly Apache 2.0
- You may use unmodified Grafana commercially. If you modify the source and offer it as SaaS, you must release modifications under AGPL-3.0.
- Grafana Cloud tiers: Free ($0), Pro ($19/mo base + usage), Enterprise ($25k+/yr)
-
Managed alternatives: AWS Managed Grafana (per-user pricing), Azure Managed Grafana (resource-based pricing)
-
Ecosystem & Data Connections:
- Native backends: Prometheus, Loki, Tempo, Mimir, Pyroscope
- First-party data sources: Elasticsearch, InfluxDB, MySQL, PostgreSQL, CloudWatch, Azure Monitor, Google Cloud Monitoring, Jaeger, Zipkin, Graphite, OpenSearch, and dozens more
- Plugin catalog: 100+ community and enterprise plugins
- Collection: Grafana Alloy (OTel Collector distribution), Grafana Agent (legacy)
- IaC: Official Terraform provider, Helm charts, Ansible roles
-
APIs: Full REST API, provisioning YAML/JSON, gRPC (plugin ↔ server)
-
Compatibility & Requirements:
- Runs on Linux, macOS, Windows, Docker, Kubernetes
- Backend database: SQLite (default), MySQL 5.7+, PostgreSQL 12+
- Browser: Modern Chrome, Firefox, Edge, Safari
- Min resources (single node): 1 CPU, 512 MB RAM
-
Recommended production: External DB (PostgreSQL), Redis for sessions, horizontal scaling behind LB
-
Alternatives:
- Datadog — All-in-one SaaS, higher cost, faster time-to-value
- Kibana — Strong for log-centric / ELK-native workflows
- New Relic — SaaS APM with generous free tier
- Chronograf — Niche, InfluxDB-specific
- Apache Superset — Open-source BI focus, less real-time
- SigNoz — Open-source, OpenTelemetry-native observability
-
Splunk Observability — Enterprise, expensive
-
Migration & Lock-in Risks:
- Low lock-in on the visualization layer — dashboards are portable JSON, data sources are external
- Moderate lock-in if you adopt the full LGTM stack — Loki's LogQL, Mimir's remote-write API, and Tempo's TraceQL are Grafana-specific query languages, though all backends use open storage formats (object storage, Prometheus TSDB)
- Migration from Datadog/New Relic → Grafana Cloud is well-documented
-
Terraform provider enables IaC portability
-
Community Health & Support:
- One of the top-50 most-starred Go projects on GitHub
- 73.1k stars, 13.7k forks, 68k+ commits, 1.2k watchers
- Active: 3.2k open issues, ~700 open PRs
- Strong community: community.grafana.com, Slack, X/Twitter
- Enterprise SLAs available through Grafana Labs
- Regular release cadence: monthly minor releases, quarterly majors
Notes In This Folder¶
Related Topics¶
- LGTM Stack — the full observability stack built around Grafana (Loki, Grafana, Tempo, Mimir, Pyroscope)
- VictoriaMetrics — alternative metrics backend, often compared to Mimir
- Prometheus — the de-facto metrics standard that Grafana was built around
- OpenTelemetry — the industry-standard telemetry collection framework; Grafana Alloy is an OTel distribution
Assets¶
Store local images, diagrams, and PDFs in the _assets/ subfolder. Prefer Mermaid for inline diagrams.
Next Actions¶
- Create comparison notes: Grafana vs Datadog, Grafana vs Kibana
- Research Grafana's AI/ML features (Sift, LLM plugin) in depth
- Benchmark Grafana Cloud vs self-hosted LGTM at various scales
Sources¶
Primary Sources¶
| URL | Source Kind | Authority | Retrieved Via | Date |
|---|---|---|---|---|
| https://github.com/grafana/grafana | repository | primary | manual | 2026-04-10 |
| https://grafana.com/docs/grafana/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/mimir/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/loki/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/tempo/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/alloy/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/pyroscope/latest/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/pricing/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/blog/ | blog | primary | web search | 2026-04-10 |
| https://grafana.com/docs/grafana/latest/administration/provisioning/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/grafana/latest/alerting/ | docs | primary | web search | 2026-04-10 |
| https://grafana.com/docs/grafana/latest/developers/plugins/ | docs | primary | web search | 2026-04-10 |
Secondary Sources¶
| URL | Source Kind | Authority | Retrieved Via | Date |
|---|---|---|---|---|
| https://github.com/grafana/helm-charts | repository | secondary | web search | 2026-04-10 |
| https://github.com/grafana/terraform-provider-grafana | repository | secondary | web search | 2026-04-10 |
| https://registry.terraform.io/providers/grafana/grafana/latest/docs | docs | secondary | web search | 2026-04-10 |
| https://grafana.com/blog/2021/04/20/grafana-loki-tempo-relicensing-to-agplv3/ | blog | primary | web search | 2026-04-10 |
Community Sources¶
| URL | Source Kind | Authority | Retrieved Via | Date |
|---|---|---|---|---|
| https://community.grafana.com/ | forum | community | manual | 2026-04-10 |
| https://slack.grafana.com/ | chat | community | manual | 2026-04-10 |
| https://play.grafana.org/ | demo | community | manual | 2026-04-10 |
Related Notes¶
Questions¶
Open¶
Answered¶
- What license is Grafana under? — AGPL-3.0 since April 2021 (was Apache 2.0), resolved in observability/grafana/index
- Can Grafana scale horizontally? — Yes, with external PostgreSQL + Redis and multiple replicas behind a load balancer, resolved in observability/grafana/operations
- What is Grafana Alloy? — The successor to Grafana Agent; a vendor-agnostic OpenTelemetry Collector distribution, resolved in observability/grafana/architecture
- How does Loki differ from Elasticsearch for log storage? — Loki indexes labels only (not full text), making it 10–100x cheaper but requiring label-first queries, resolved in observability/grafana/architecture
- How does Grafana's AI/ML Sift feature work internally, and what are its practical capabilities vs marketing claims? — Sift is a diagnostic assistant (free for all Grafana Cloud accounts) that automatically investigates infrastructure telemetry by correlating metrics, logs, and traces to surface potential incident causes. It can be launched from Grafana Explore, dashboards, or Grafana Incident. It analyzes query labels and time ranges to perform automated root-cause analysis, rather than being a general-purpose AI -- it is scoped to observability data exploration and incident triage, resolved in observability/grafana/architecture
- What is the real-world cost comparison of Grafana Cloud vs self-hosted LGTM at 1M, 10M, and 100M active series? — Grafana Cloud bills ~$8/active series/month for metrics plus per-GB for logs/traces (free tier: 10K series, 50 GB logs, 50 GB traces). Self-hosted LGTM at 1M series costs ~$500-1,500/mo infrastructure; at 10M ~$3,000-8,000/mo; at 100M ~$15,000-50,000/mo. Grafana Cloud is roughly 6-10x more expensive at scale but eliminates operational overhead (1-3 FTE). Grafana offers committed-use discounts of 30-50% on annual contracts, resolved in observability/grafana/architecture
- How does Mimir's query-sharding compare to Thanos's query splitting in terms of latency and resource usage? — Mimir query sharding splits each query into N parallel partial queries (configured via
-query-frontend.query-sharding-total-shards, default 16), parallelizing execution across queriers. It increases querier and store-gateway load but reduces wall-clock latency. Cardinality estimation (experimental) dynamically adjusts shard count based on prior query cardinality. Thanos query splitting divides by time range instead of by series hash, which is less effective for high-cardinality queries. Mimir sharding is generally faster for cardinality-heavy queries but requires more resources, resolved in observability/grafana/architecture - What is the migration path from an existing ELK stack (Elasticsearch + Kibana) to Loki + Grafana? — Grafana strongly recommends using Grafana Alloy (not the Logstash Loki plugin) for new deployments, noting significant challenges with Logstash label configuration and backpressure handling. Migration steps: (1) Deploy Alloy alongside existing Logstash/Filebeat to dual-ship logs, (2) rebuild Kibana dashboards in Grafana (no automated converter exists), (3) rewrite Elasticsearch queries to LogQL, (4) carefully select low-cardinality labels for Loki to avoid performance issues, (5) configure object storage (S3/GCS) for retention. Historical data requires re-indexing or keeping ES read-only during transition, resolved in observability/grafana/operations
- How mature is Pyroscope's continuous profiling integration with Grafana and Tempo? What is the production-readiness level? — Pyroscope is a multi-tenant continuous profiling platform aligned architecturally with Mimir/Loki/Tempo. It integrates via Span Profiles (correlating traces with profiles), supports push-mode SDKs (Go, Java, Python, Ruby, .NET, Rust, Node.js) and eBPF-based pull-mode profiling via Grafana Alloy. Deployed as single-binary for dev or distributed microservices on Kubernetes with object storage backends. The integration with Grafana's Explore Profiles UI and Tempo span correlation is functional and production-viable, resolved in observability/grafana/architecture
- What are the specific differences between Grafana Enterprise and Grafana Cloud Enterprise? — Grafana Enterprise is the self-managed commercial edition adding exclusive data source plugins, advanced admin features (audit logging, RBAC, LDAP/SSO/SAML), and 24x7 support. Grafana Cloud Enterprise provides many of the same Enterprise features in the hosted cloud platform, plus access to Enterprise plugins for contracted plans. Key difference: self-managed Enterprise requires operating your own infrastructure; Cloud Enterprise offloads ops but has feature parity for most administrative capabilities, resolved in observability/grafana/index
- How does Grafana's new Scenes framework change dashboard development for plugin authors? —
@grafana/scenesis a TypeScript framework for building highly interactive, dashboard-like experiences in Grafana app plugins. It providesSceneAppwith routing/navigation,EmbeddedScenefor standalone components,SceneQueryRunnerwithSceneDataTransformerfor data pipelines,SceneGridLayoutwith draggable/resizable panels, template variable support, URL sync, and time range inheritance through a scene graph hierarchy. It replaces manual React dashboard wiring with a declarative component model, resolved in observability/grafana/architecture - What is the optimal label cardinality strategy for Loki when handling Kubernetes logs at 1000+ pods? — Keep indexed labels to ~10 or fewer per stream, using only low-cardinality values (job, app, environment, namespace, cluster, container_name). Move high-cardinality metadata (pod names, process IDs, trace IDs) to structured metadata instead of labels. For OTel ingestion, explicitly remove
k8s.pod.nameandservice.instance.idfromdefault_resource_attributes_as_index_labels. Structured metadata (Loki 3.x+) allows querying without indexing overhead. For 1000+ pods, never use pod_name as a label -- it creates thousands of separate streams, resolved in observability/grafana/operations - How does TraceQL Metrics (experimental) compare to the span-metrics generator in Tempo? — TraceQL Metrics (introduced in Tempo 2.4) computes metrics on-the-fly from trace data using TraceQL queries with aggregate functions (e.g.,
{ span:name = "foo" } | rate() by (span:status)), similar to how LogQL metrics work for logs. It requires no separate metrics backend or pre-configuration. The span-metrics generator is a separate Tempo component that pre-computes RED metrics from spans and writes them to a Prometheus-compatible backend. TraceQL Metrics is more flexible for exploratory analysis but computed at query time; span-metrics is faster for dashboards/alerts since metrics are pre-materialized, resolved in observability/grafana/architecture