Operations¶
Deployment, configuration, scaling, monitoring, upgrades, and day-2 operations for Zitadel.
Deployment¶
Docker Compose (Quick Start)¶
services:
zitadel:
image: "ghcr.io/zitadel/zitadel:v4.13.0"
command: >
start-from-init
--config /example-zitadel-config.yaml
--config /example-zitadel-secrets.yaml
--steps /example-zitadel-init-steps.yaml
--masterkey "${ZITADEL_MASTERKEY}"
--tlsMode disabled
ports:
- "8080:8080"
depends_on:
db:
condition: "service_healthy"
environment:
ZITADEL_MASTERKEY: "${ZITADEL_MASTERKEY}"
db:
image: postgres:17-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
environment:
POSTGRES_USER: zitadel
POSTGRES_PASSWORD: zitadel
POSTGRES_DB: zitadel
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Kubernetes via Helm (Production)¶
# Add Helm repo
helm repo add zitadel https://charts.zitadel.com
helm repo update zitadel
# Install with custom values
helm install zitadel zitadel/zitadel \
--namespace zitadel --create-namespace \
--values values.yaml
Key Helm values (values.yaml):
zitadel:
masterkey: "" # Set via --set or external secrets
configmapConfig:
ExternalPort: 443
ExternalDomain: auth.example.com
TLS:
Enabled: false # TLS terminated at ingress
Database:
Postgres:
Host: postgres-ha.zitadel.svc.cluster.local
Port: 5432
Database: zitadel
User:
Username: zitadel
SSL:
Mode: verify-full
Cache:
Enabled: true
Config:
Connection:
Addrs:
- redis.zitadel.svc.cluster.local:6379
secretConfig:
Database:
Postgres:
User:
Password: "" # Set via external secrets
replicaCount: 3
ingress:
enabled: true
className: nginx
hosts:
- host: auth.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: zitadel-tls
hosts:
- auth.example.com
Lifecycle Phases (Kubernetes)¶
Zitadel's Helm chart separates the deployment into distinct phases:
flowchart LR
Init["Init Job\n(DB creation,\ngrant setup)"]
Setup["Setup Job\n(Schema migration,\ndefault resources)"]
Runtime["Runtime Deployment\n(Serving pods)"]
Init --> Setup --> Runtime
- Init Job — Creates the database, grants permissions
- Setup Job — Runs schema migrations, creates default resources (admin user, default org)
- Runtime Deployment — Starts serving pods behind a Service
Configuration¶
Key Configuration Parameters¶
| Parameter | Default | Description |
|---|---|---|
ExternalDomain |
localhost | Public-facing domain for OIDC endpoints |
ExternalPort |
80/443 | Public-facing port |
TLS.Enabled |
false | Enable TLS on Zitadel pod (use false with ingress TLS) |
Database.Postgres.Host |
localhost | PostgreSQL host |
Database.Postgres.Port |
5432 | PostgreSQL port |
Cache.Enabled |
false | Enable Redis caching |
Log.Level |
info | Log level (debug, info, warning, error) |
Telemetry.Enabled |
true | Anonymous usage telemetry |
Environment Variables¶
| Variable | Required | Description |
|---|---|---|
ZITADEL_MASTERKEY |
Yes | 32-byte hex-encoded encryption master key |
ZITADEL_EXTERNALSECURE |
No | Set to true if TLS is terminated externally |
ZITADEL_DATABASE_POSTGRES_USER_PASSWORD |
No | PostgreSQL password (use secret refs in K8s) |
TLS Configuration¶
# Option 1: TLS at Zitadel pod
zitadel:
configmapConfig:
TLS:
Enabled: true
secretConfig:
TLS:
Key: |-
-----BEGIN RSA PRIVATE KEY-----
...
Certificate: |-
-----BEGIN CERTIFICATE-----
...
# Option 2: TLS at ingress (recommended)
ingress:
enabled: true
tls:
- secretName: zitadel-tls
hosts:
- auth.example.com
NGINX Reverse Proxy Configuration¶
When using NGINX for TLS termination and HTTP/2 support:
server {
listen 443 ssl http2;
server_name auth.example.com;
ssl_certificate /etc/nginx/tls/tls.crt;
ssl_certificate_key /etc/nginx/tls/tls.key;
location / {
proxy_pass http://zitadel:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# gRPC-Web support
proxy_set_header Grpc-Status $grpc_status;
proxy_set_header Grpc-Message $grpc_message;
}
}
Scaling¶
Horizontal Scaling¶
Zitadel is stateless — all state lives in PostgreSQL and Redis:
# Scale replicas directly
kubectl scale deployment zitadel --replicas=5 -n zitadel
# Or use HPA
kubectl autoscale deployment zitadel \
--min=3 --max=10 \
--cpu-percent=70 \
-n zitadel
Database Scaling¶
| Strategy | Approach |
|---|---|
| Read replicas | PostgreSQL streaming replication for read-heavy query workloads |
| Connection pooling | PgBouncer or built-in connection pooling |
| Patroni | HA with automatic failover |
| Managed DB | CloudSQL, RDS, or Azure Database for PostgreSQL |
Caching¶
Redis reduces query load on PostgreSQL:
| Cache Target | TTL | Impact |
|---|---|---|
| User memberships | Short | Reduces permission resolution latency |
| Organization policies | Medium | Reduces login flow database queries |
| OIDC configuration | Long | Rarely changes, high hit rate |
Monitoring¶
Health Check Endpoints¶
# REST health check
curl -f http://zitadel:8080/admin/v1/healthz
# gRPC health check
grpcurl zitadel:8080 zitadel.admin.v1.AdminService/Healthz
# Kubernetes readiness (configured in Helm chart)
kubectl get pods -n zitadel
OpenTelemetry Integration¶
Zitadel supports push-based OpenTelemetry for all signals:
zitadel:
configmapConfig:
Telemetry:
OpenTelemetry:
Traces:
Enabled: true
Endpoint: "otel-collector.observability.svc.cluster.local:4317"
Insecure: true
Metrics:
Enabled: true
Endpoint: "otel-collector.observability.svc.cluster.local:4317"
Logs:
Enabled: true
Endpoint: "otel-collector.observability.svc.cluster.local:4317"
Key Metrics to Watch¶
| Metric | Alert Threshold | Description |
|---|---|---|
| Request latency (p99) | > 500ms | API responsiveness |
| Event store push latency | > 100ms | Write path health |
| Projection lag | > 1000 events | Read model freshness |
| PostgreSQL connections | > 80% max | Connection pool saturation |
| Redis hit rate | < 90% | Cache effectiveness |
| Login error rate | > 1% | Authentication failures |
Upgrades¶
Kubernetes Upgrade Process¶
# Update Helm repo
helm repo update zitadel
# Dry-run to verify
helm upgrade zitadel zitadel/zitadel \
--namespace zitadel \
--values values.yaml \
--dry-run
# Perform upgrade (Setup Job handles migrations)
helm upgrade zitadel zitadel/zitadel \
--namespace zitadel \
--values values.yaml
The upgrade process: 1. New image tag triggers a Setup Job for database migrations 2. Migrations are backwards-compatible where possible 3. Rolling deployment updates pods one at a time 4. Zero downtime is supported for minor version upgrades
Version Upgrade Compatibility¶
| Upgrade Path | Notes |
|---|---|
| v4.x to v4.x+1 | Rolling update, zero downtime |
| v3.x to v4.x | Requires CockroachDB to PostgreSQL migration |
| v4.x to v5.x | Upcoming, migration guide expected |
Backup & Recovery¶
Database Backup¶
# PostgreSQL logical backup
pg_dump -h postgres-host -U zitadel -d zitadel > zitadel-backup.sql
# Restore
psql -h postgres-host -U zitadel -d zitadel < zitadel-backup.sql
Event Store Recovery¶
Because all state is derived from events, recovery is: 1. Restore PostgreSQL from backup 2. Zitadel replays unprocessed events through projections 3. Read models are rebuilt to match the event log
Master Key Rotation¶
Master Key Rotation
Rotating the master key requires re-encrypting all stored secrets. Plan for downtime during rotation and test in staging first.
Common Issues¶
| Issue | Cause | Resolution |
|---|---|---|
| Login redirects fail | ExternalDomain misconfigured |
Verify domain matches the OIDC redirect URI |
| gRPC errors behind proxy | Missing HTTP/2 support | Enable http2 in NGINX/proxy config |
| Slow API responses | Projection lag | Check projection worker health, increase resources |
connection refused to DB |
Network policy / service mesh | Verify PostgreSQL is reachable from Zitadel pods |
| Token validation fails | Clock skew between services | Ensure NTP sync on all nodes |
| MFA enrollment loop | Login policy conflicts | Check org-level vs instance-level policy precedence |