Skip to content

Security

Security model for the LGTM stack (Loki + Grafana + Tempo + Mimir). Covers multi-tenant isolation, OTLP authentication, object storage encryption, and per-tenant overrides. See also: observability/lgtm/index, observability/lgtm/architecture, observability/lgtm/architecture.

Authentication Architecture

In a production LGTM deployment, no backend component (Mimir, Loki, Tempo) should be directly exposed to untrusted networks. Authentication and authorization are enforced at two layers:

  1. Edge proxy (Nginx, Envoy, Grafana Enterprise Gateway) validates identity and injects tenant headers.
  2. Component-level multi-tenancy uses the X-Scope-OrgID header to isolate data.
flowchart TD
    subgraph "External Clients"
        Alloy[Grafana Alloy<br/>Metrics/Logs/Traces]
        Users[Grafana Users]
    end

    subgraph "Auth Layer"
        Gateway[Auth Gateway<br/>Envoy / Nginx / GEM Gateway]
    end

    subgraph "LGTM Backends"
        Mimir[Mimir<br/>Metrics Backend]
        Loki[Loki<br/>Logs Backend]
        Tempo[Tempo<br/>Traces Backend]
    end

    subgraph "Object Storage"
        S3[S3 / GCS / Azure Blob<br/>SSE-S3 or SSE-KMS]
    end

    Alloy -->|OTLP + X-Scope-OrgID| Gateway
    Users -->|HTTP + Auth| Gateway
    Gateway -->|remote_write + tenant header| Mimir
    Gateway -->|OTLP + tenant header| Loki
    Gateway -->|OTLP + tenant header| Tempo
    Mimir --> S3
    Loki --> S3
    Tempo --> S3

Multi-Tenant Isolation

X-Scope-OrgID Header

All three backends use the X-Scope-OrgID HTTP header as the tenant identifier. This is the fundamental isolation mechanism across the LGTM stack.

  • Mimir: -auth.multitenancy-enabled=true enforces the header on every request.
  • Loki: auth_enabled: true in the configuration activates tenant isolation.
  • Tempo: Multi-tenancy configured via the TempoStack CRD or tempo.yaml.
# Mimir multi-tenancy configuration
multitenancy_enabled: true
multitenancy_mode: standard  # or "multiplexing" for shared TSDB
# OpenTelemetry Collector OTLP export with tenant header
exporters:
  otlp:
    endpoint: tempo.example.com:4317
    headers:
      x-scope-orgid: tenant-engineering

Tenant ID Rules

Property Loki Mimir Tempo
Max length 150 bytes Configurable Configurable
Allowed chars Alphanumeric, !, -, _, ., *, ', (, ) Alphanumeric Alphanumeric
Invalid values . and .. Empty string Empty string
Multi-tenant query Pipe-separated: A\|B\|C Pipe-separated with -tenant-federation.enabled=true Per-tenant overrides only

Data Isolation Guarantees

Each backend stores tenant data in separate prefixes or paths within object storage:

  • Mimir: Tenant data stored under <bucket>/<tenant-id>/ prefix. TSDB blocks are tenant-scoped.
  • Loki: Chunks and indexes separated by tenant. Compactor runs per-tenant.
  • Tempo: Trace data partitioned by tenant in object storage blocks.

Tenants cannot access each other's data unless the gateway explicitly supports cross-tenant federation.

OTLP Authentication

Ingestion Authentication

OTLP ingestion endpoints should not accept unauthenticated traffic. Common patterns:

mTLS (Mutual TLS): Client certificates verify identity at the gateway level. Tempo supports mTLS directly on its gRPC receiver.

# Tempo gRPC TLS
server:
  http_listen_port: 3200
  grpc_listen_port: 9095
  tls_cert_path: /etc/tempo/certs/server.crt
  tls_key_path: /etc/tempo/certs/server.key
  tls_client_ca_path: /etc/tempo/certs/ca.crt

Token-based authentication: An auth gateway validates bearer tokens or API keys and injects the appropriate X-Scope-OrgID header before forwarding to backends.

# Envoy token validation
http_filters:
- name: envoy.filters.http.ext_authz
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
    http_service:
      server_uri:
        uri: auth-service:8080

Mimir TLS Configuration

Mimir supports mutual TLS on both HTTP and gRPC interfaces:

-server.http-tls-cert-path=/certs/server.crt
-server.http-tls-key-path=/certs/server.key
-server.http-tls-client-auth="RequireAndVerifyClientCert"
-server.http-tls-ca-path="/certs/ca.crt"
-server.grpc-tls-cert-path=/certs/server.crt
-server.grpc-tls-key-path=/certs/server.key
-server.grpc-tls-client-auth="RequireAndVerifyClientCert"
-server.grpc-tls-ca-path="/certs/ca.crt"

Object Storage Encryption

Server-Side Encryption

All LGTM backends store data in S3-compatible object storage. Encryption at rest is configured at the bucket level:

Method Description Key Management
SSE-S3 AWS-managed keys AWS handles rotation
SSE-KMS Customer-managed KMS key Full key lifecycle control
SSE-C Customer-provided key Client-side key management
CSE Client-side encryption Application encrypts before upload

Recommendation

Use SSE-KMS with a customer-managed key per LGTM component. This provides audit trails via CloudTrail for key usage and allows independent key rotation for Mimir, Loki, and Tempo data.

Bucket Isolation

Use separate buckets per component to prevent cross-signal data leakage and simplify lifecycle policies:

observability-mimir-<env>    # Metrics data
observability-loki-<env>     # Logs data
observability-tempo-<env>    # Traces data

Bucket policies should restrict access to only the component service account:

{
  "Effect": "Allow",
  "Principal": {"AWS": "arn:aws:iam::123456789:role/mimir-role"},
  "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
  "Resource": ["arn:aws:s3:::observability-mimir-prod", "arn:aws:s3:::observability-mimir-prod/*"]
}

Per-Tenant Overrides

Each LGTM backend supports runtime-configurable per-tenant limits. These prevent noisy-neighbor scenarios and enforce fair resource allocation.

Mimir Per-Tenant Limits

overrides:
  "tenant-engineering":
    ingestion_rate: 100000
    ingestion_burst_size: 200000
    max_global_series_per_user: 500000
    max_chunks_per_query: 2000000
    query_timeout: 120s
  "*":  # Default for all tenants without explicit override
    ingestion_rate: 50000
    max_global_series_per_user: 150000

Loki Per-Tenant Limits

overrides:
  "tenant-frontend":
    ingestion_rate_mb: 10
    max_streams_per_user: 100000
    max_chunks_per_query: 100000
    retention_period: 720h
  "*":
    ingestion_rate_mb: 5
    max_streams_per_user: 50000

Tempo Per-Tenant Limits

overrides:
  "tenant-payments":
    ingestion:
      rate_size_bytes: 50000000
      burst_limit_bytes: 100000000
      max_traces_per_user: 10000
    global:
      max_bytes_per_trace: 5000000
  "*":
    ingestion:
      rate_size_bytes: 20000000
      max_traces_per_user: 5000

Overrides Runtime Reload

Per-tenant overrides are loaded from a separate YAML file and reloaded at runtime without restarting the component. Use the per_tenant_override_config path in each backend's configuration.

Tempo Multi-Tenancy with OIDC

The Tempo Operator supports OIDC-based multi-tenancy with static RBAC:

apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: production
spec:
  template:
    gateway:
      enabled: true
    queryFrontend:
      jaegerQuery:
        enabled: true
  tenants:
    mode: static
    authentication:
      - tenantName: engineering
        tenantId: eng-team
        oidc:
          issuerURL: https://dex.YOUR_DOMAIN/dex
          redirectURL: https://tempo-gateway.example.com/oidc/eng-team/callback
          usernameClaim: email
          secret:
            name: tempo-oidc-secret
    authorization:
      roles:
        - name: read-write
          permissions: [read, write]
          resources: [traces]
          tenants: [engineering]
      roleBindings:
        - name: engineers
          roles: [read-write]
          subjects:
            - kind: user
              name: [email protected]

Grafana Integration Security

Data Source Configuration

When connecting Grafana to multi-tenant LGTM backends:

  • Configure the X-Scope-OrgID header in each data source's custom HTTP headers.
  • Use separate data sources per tenant or configure Grafana's data source proxy to inject the correct header.
  • Enable basic auth or bearer token auth on the data source if the gateway requires it.

Query Security

  • Loki: Tenant ID filtering in queries uses the __tenant_id__ label for cross-tenant queries (e.g., {app="api", __tenant_id__=~"eng.+"}).
  • Multi-tenant queries require multi_tenant_queries_enabled: true in the querier config.
  • Push (POST /loki/api/v1/push) and tail (GET /loki/api/v1/tail) endpoints reject multi-tenant requests.

Network Hardening

Component Default Ports Recommended Exposure
Mimir distributor 8080 (HTTP), 9095 (gRPC) Internal only; behind gateway
Loki distributor 3100 (HTTP) Internal only; behind gateway
Tempo ingester 3200 (HTTP), 9095 (gRPC) Internal only; behind gateway
Object storage 443 (HTTPS) Private endpoint or VPC-only
Grafana 3000 (HTTP) TLS-terminated proxy

Hardening Checklist

Area Recommendation
Network isolation All backends in private VPC/subnet; no direct internet access
TLS everywhere mTLS between components; TLS at gateway edge
Auth gateway Centralize authentication; inject X-Scope-OrgID
Per-tenant limits Configure rate limiting and resource quotas for every tenant
Bucket isolation Separate S3 bucket per component per environment
Encryption at rest SSE-KMS with customer-managed keys
Secrets management Store storage credentials and TLS keys in Vault or cloud secret manager
Audit logging Enable access logs on gateway and object storage

Sources