Security¶
Security model for the LGTM stack (Loki + Grafana + Tempo + Mimir). Covers multi-tenant isolation, OTLP authentication, object storage encryption, and per-tenant overrides. See also: observability/lgtm/index, observability/lgtm/architecture, observability/lgtm/architecture.
Authentication Architecture¶
In a production LGTM deployment, no backend component (Mimir, Loki, Tempo) should be directly exposed to untrusted networks. Authentication and authorization are enforced at two layers:
- Edge proxy (Nginx, Envoy, Grafana Enterprise Gateway) validates identity and injects tenant headers.
- Component-level multi-tenancy uses the
X-Scope-OrgIDheader to isolate data.
flowchart TD
subgraph "External Clients"
Alloy[Grafana Alloy<br/>Metrics/Logs/Traces]
Users[Grafana Users]
end
subgraph "Auth Layer"
Gateway[Auth Gateway<br/>Envoy / Nginx / GEM Gateway]
end
subgraph "LGTM Backends"
Mimir[Mimir<br/>Metrics Backend]
Loki[Loki<br/>Logs Backend]
Tempo[Tempo<br/>Traces Backend]
end
subgraph "Object Storage"
S3[S3 / GCS / Azure Blob<br/>SSE-S3 or SSE-KMS]
end
Alloy -->|OTLP + X-Scope-OrgID| Gateway
Users -->|HTTP + Auth| Gateway
Gateway -->|remote_write + tenant header| Mimir
Gateway -->|OTLP + tenant header| Loki
Gateway -->|OTLP + tenant header| Tempo
Mimir --> S3
Loki --> S3
Tempo --> S3
Multi-Tenant Isolation¶
X-Scope-OrgID Header¶
All three backends use the X-Scope-OrgID HTTP header as the tenant identifier. This is the fundamental isolation mechanism across the LGTM stack.
- Mimir:
-auth.multitenancy-enabled=trueenforces the header on every request. - Loki:
auth_enabled: truein the configuration activates tenant isolation. - Tempo: Multi-tenancy configured via the TempoStack CRD or tempo.yaml.
# Mimir multi-tenancy configuration
multitenancy_enabled: true
multitenancy_mode: standard # or "multiplexing" for shared TSDB
# OpenTelemetry Collector OTLP export with tenant header
exporters:
otlp:
endpoint: tempo.example.com:4317
headers:
x-scope-orgid: tenant-engineering
Tenant ID Rules¶
| Property | Loki | Mimir | Tempo |
|---|---|---|---|
| Max length | 150 bytes | Configurable | Configurable |
| Allowed chars | Alphanumeric, !, -, _, ., *, ', (, ) |
Alphanumeric | Alphanumeric |
| Invalid values | . and .. |
Empty string | Empty string |
| Multi-tenant query | Pipe-separated: A\|B\|C |
Pipe-separated with -tenant-federation.enabled=true |
Per-tenant overrides only |
Data Isolation Guarantees¶
Each backend stores tenant data in separate prefixes or paths within object storage:
- Mimir: Tenant data stored under
<bucket>/<tenant-id>/prefix. TSDB blocks are tenant-scoped. - Loki: Chunks and indexes separated by tenant. Compactor runs per-tenant.
- Tempo: Trace data partitioned by tenant in object storage blocks.
Tenants cannot access each other's data unless the gateway explicitly supports cross-tenant federation.
OTLP Authentication¶
Ingestion Authentication¶
OTLP ingestion endpoints should not accept unauthenticated traffic. Common patterns:
mTLS (Mutual TLS): Client certificates verify identity at the gateway level. Tempo supports mTLS directly on its gRPC receiver.
# Tempo gRPC TLS
server:
http_listen_port: 3200
grpc_listen_port: 9095
tls_cert_path: /etc/tempo/certs/server.crt
tls_key_path: /etc/tempo/certs/server.key
tls_client_ca_path: /etc/tempo/certs/ca.crt
Token-based authentication:
An auth gateway validates bearer tokens or API keys and injects the appropriate X-Scope-OrgID header before forwarding to backends.
# Envoy token validation
http_filters:
- name: envoy.filters.http.ext_authz
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
http_service:
server_uri:
uri: auth-service:8080
Mimir TLS Configuration¶
Mimir supports mutual TLS on both HTTP and gRPC interfaces:
-server.http-tls-cert-path=/certs/server.crt
-server.http-tls-key-path=/certs/server.key
-server.http-tls-client-auth="RequireAndVerifyClientCert"
-server.http-tls-ca-path="/certs/ca.crt"
-server.grpc-tls-cert-path=/certs/server.crt
-server.grpc-tls-key-path=/certs/server.key
-server.grpc-tls-client-auth="RequireAndVerifyClientCert"
-server.grpc-tls-ca-path="/certs/ca.crt"
Object Storage Encryption¶
Server-Side Encryption¶
All LGTM backends store data in S3-compatible object storage. Encryption at rest is configured at the bucket level:
| Method | Description | Key Management |
|---|---|---|
| SSE-S3 | AWS-managed keys | AWS handles rotation |
| SSE-KMS | Customer-managed KMS key | Full key lifecycle control |
| SSE-C | Customer-provided key | Client-side key management |
| CSE | Client-side encryption | Application encrypts before upload |
Recommendation
Use SSE-KMS with a customer-managed key per LGTM component. This provides audit trails via CloudTrail for key usage and allows independent key rotation for Mimir, Loki, and Tempo data.
Bucket Isolation¶
Use separate buckets per component to prevent cross-signal data leakage and simplify lifecycle policies:
observability-mimir-<env> # Metrics data
observability-loki-<env> # Logs data
observability-tempo-<env> # Traces data
Bucket policies should restrict access to only the component service account:
{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789:role/mimir-role"},
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::observability-mimir-prod", "arn:aws:s3:::observability-mimir-prod/*"]
}
Per-Tenant Overrides¶
Each LGTM backend supports runtime-configurable per-tenant limits. These prevent noisy-neighbor scenarios and enforce fair resource allocation.
Mimir Per-Tenant Limits¶
overrides:
"tenant-engineering":
ingestion_rate: 100000
ingestion_burst_size: 200000
max_global_series_per_user: 500000
max_chunks_per_query: 2000000
query_timeout: 120s
"*": # Default for all tenants without explicit override
ingestion_rate: 50000
max_global_series_per_user: 150000
Loki Per-Tenant Limits¶
overrides:
"tenant-frontend":
ingestion_rate_mb: 10
max_streams_per_user: 100000
max_chunks_per_query: 100000
retention_period: 720h
"*":
ingestion_rate_mb: 5
max_streams_per_user: 50000
Tempo Per-Tenant Limits¶
overrides:
"tenant-payments":
ingestion:
rate_size_bytes: 50000000
burst_limit_bytes: 100000000
max_traces_per_user: 10000
global:
max_bytes_per_trace: 5000000
"*":
ingestion:
rate_size_bytes: 20000000
max_traces_per_user: 5000
Overrides Runtime Reload
Per-tenant overrides are loaded from a separate YAML file and reloaded at runtime without restarting the component. Use the per_tenant_override_config path in each backend's configuration.
Tempo Multi-Tenancy with OIDC¶
The Tempo Operator supports OIDC-based multi-tenancy with static RBAC:
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: production
spec:
template:
gateway:
enabled: true
queryFrontend:
jaegerQuery:
enabled: true
tenants:
mode: static
authentication:
- tenantName: engineering
tenantId: eng-team
oidc:
issuerURL: https://dex.YOUR_DOMAIN/dex
redirectURL: https://tempo-gateway.example.com/oidc/eng-team/callback
usernameClaim: email
secret:
name: tempo-oidc-secret
authorization:
roles:
- name: read-write
permissions: [read, write]
resources: [traces]
tenants: [engineering]
roleBindings:
- name: engineers
roles: [read-write]
subjects:
- kind: user
name: [email protected]
Grafana Integration Security¶
Data Source Configuration¶
When connecting Grafana to multi-tenant LGTM backends:
- Configure the
X-Scope-OrgIDheader in each data source's custom HTTP headers. - Use separate data sources per tenant or configure Grafana's data source proxy to inject the correct header.
- Enable
basic authorbearer tokenauth on the data source if the gateway requires it.
Query Security¶
- Loki: Tenant ID filtering in queries uses the
__tenant_id__label for cross-tenant queries (e.g.,{app="api", __tenant_id__=~"eng.+"}). - Multi-tenant queries require
multi_tenant_queries_enabled: truein the querier config. - Push (
POST /loki/api/v1/push) and tail (GET /loki/api/v1/tail) endpoints reject multi-tenant requests.
Network Hardening¶
| Component | Default Ports | Recommended Exposure |
|---|---|---|
| Mimir distributor | 8080 (HTTP), 9095 (gRPC) | Internal only; behind gateway |
| Loki distributor | 3100 (HTTP) | Internal only; behind gateway |
| Tempo ingester | 3200 (HTTP), 9095 (gRPC) | Internal only; behind gateway |
| Object storage | 443 (HTTPS) | Private endpoint or VPC-only |
| Grafana | 3000 (HTTP) | TLS-terminated proxy |
Hardening Checklist¶
| Area | Recommendation |
|---|---|
| Network isolation | All backends in private VPC/subnet; no direct internet access |
| TLS everywhere | mTLS between components; TLS at gateway edge |
| Auth gateway | Centralize authentication; inject X-Scope-OrgID |
| Per-tenant limits | Configure rate limiting and resource quotas for every tenant |
| Bucket isolation | Separate S3 bucket per component per environment |
| Encryption at rest | SSE-KMS with customer-managed keys |
| Secrets management | Store storage credentials and TLS keys in Vault or cloud secret manager |
| Audit logging | Enable access logs on gateway and object storage |