Skip to content

Operations

Deployment, configuration, scaling, monitoring, upgrades, and day-2 operations for Zitadel.

Deployment

Docker Compose (Quick Start)

services:
  zitadel:
    image: "ghcr.io/zitadel/zitadel:v4.13.0"
    command: >
      start-from-init
      --config /example-zitadel-config.yaml
      --config /example-zitadel-secrets.yaml
      --steps /example-zitadel-init-steps.yaml
      --masterkey "${ZITADEL_MASTERKEY}"
      --tlsMode disabled
    ports:
      - "8080:8080"
    depends_on:
      db:
        condition: "service_healthy"
    environment:
      ZITADEL_MASTERKEY: "${ZITADEL_MASTERKEY}"

  db:
    image: postgres:17-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
    environment:
      POSTGRES_USER: zitadel
      POSTGRES_PASSWORD: zitadel
      POSTGRES_DB: zitadel
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Kubernetes via Helm (Production)

# Add Helm repo
helm repo add zitadel https://charts.zitadel.com
helm repo update zitadel

# Install with custom values
helm install zitadel zitadel/zitadel \
  --namespace zitadel --create-namespace \
  --values values.yaml

Key Helm values (values.yaml):

zitadel:
  masterkey: ""  # Set via --set or external secrets
  configmapConfig:
    ExternalPort: 443
    ExternalDomain: auth.example.com
    TLS:
      Enabled: false  # TLS terminated at ingress
    Database:
      Postgres:
        Host: postgres-ha.zitadel.svc.cluster.local
        Port: 5432
        Database: zitadel
        User:
          Username: zitadel
          SSL:
            Mode: verify-full
    Cache:
      Enabled: true
      Config:
        Connection:
          Addrs:
            - redis.zitadel.svc.cluster.local:6379

  secretConfig:
    Database:
      Postgres:
        User:
          Password: ""  # Set via external secrets

replicaCount: 3

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: auth.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: zitadel-tls
      hosts:
        - auth.example.com

Lifecycle Phases (Kubernetes)

Zitadel's Helm chart separates the deployment into distinct phases:

flowchart LR
    Init["Init Job\n(DB creation,\ngrant setup)"]
    Setup["Setup Job\n(Schema migration,\ndefault resources)"]
    Runtime["Runtime Deployment\n(Serving pods)"]

    Init --> Setup --> Runtime
  1. Init Job — Creates the database, grants permissions
  2. Setup Job — Runs schema migrations, creates default resources (admin user, default org)
  3. Runtime Deployment — Starts serving pods behind a Service

Configuration

Key Configuration Parameters

Parameter Default Description
ExternalDomain localhost Public-facing domain for OIDC endpoints
ExternalPort 80/443 Public-facing port
TLS.Enabled false Enable TLS on Zitadel pod (use false with ingress TLS)
Database.Postgres.Host localhost PostgreSQL host
Database.Postgres.Port 5432 PostgreSQL port
Cache.Enabled false Enable Redis caching
Log.Level info Log level (debug, info, warning, error)
Telemetry.Enabled true Anonymous usage telemetry

Environment Variables

Variable Required Description
ZITADEL_MASTERKEY Yes 32-byte hex-encoded encryption master key
ZITADEL_EXTERNALSECURE No Set to true if TLS is terminated externally
ZITADEL_DATABASE_POSTGRES_USER_PASSWORD No PostgreSQL password (use secret refs in K8s)

TLS Configuration

# Option 1: TLS at Zitadel pod
zitadel:
  configmapConfig:
    TLS:
      Enabled: true
  secretConfig:
    TLS:
      Key: |-
        -----BEGIN RSA PRIVATE KEY-----
        ...
      Certificate: |-
        -----BEGIN CERTIFICATE-----
        ...

# Option 2: TLS at ingress (recommended)
ingress:
  enabled: true
  tls:
    - secretName: zitadel-tls
      hosts:
        - auth.example.com

NGINX Reverse Proxy Configuration

When using NGINX for TLS termination and HTTP/2 support:

server {
    listen 443 ssl http2;
    server_name auth.example.com;

    ssl_certificate /etc/nginx/tls/tls.crt;
    ssl_certificate_key /etc/nginx/tls/tls.key;

    location / {
        proxy_pass http://zitadel:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # gRPC-Web support
        proxy_set_header Grpc-Status $grpc_status;
        proxy_set_header Grpc-Message $grpc_message;
    }
}

Scaling

Horizontal Scaling

Zitadel is stateless — all state lives in PostgreSQL and Redis:

# Scale replicas directly
kubectl scale deployment zitadel --replicas=5 -n zitadel

# Or use HPA
kubectl autoscale deployment zitadel \
  --min=3 --max=10 \
  --cpu-percent=70 \
  -n zitadel

Database Scaling

Strategy Approach
Read replicas PostgreSQL streaming replication for read-heavy query workloads
Connection pooling PgBouncer or built-in connection pooling
Patroni HA with automatic failover
Managed DB CloudSQL, RDS, or Azure Database for PostgreSQL

Caching

Redis reduces query load on PostgreSQL:

Cache Target TTL Impact
User memberships Short Reduces permission resolution latency
Organization policies Medium Reduces login flow database queries
OIDC configuration Long Rarely changes, high hit rate

Monitoring

Health Check Endpoints

# REST health check
curl -f http://zitadel:8080/admin/v1/healthz

# gRPC health check
grpcurl zitadel:8080 zitadel.admin.v1.AdminService/Healthz

# Kubernetes readiness (configured in Helm chart)
kubectl get pods -n zitadel

OpenTelemetry Integration

Zitadel supports push-based OpenTelemetry for all signals:

zitadel:
  configmapConfig:
    Telemetry:
      OpenTelemetry:
        Traces:
          Enabled: true
          Endpoint: "otel-collector.observability.svc.cluster.local:4317"
          Insecure: true
        Metrics:
          Enabled: true
          Endpoint: "otel-collector.observability.svc.cluster.local:4317"
        Logs:
          Enabled: true
          Endpoint: "otel-collector.observability.svc.cluster.local:4317"

Key Metrics to Watch

Metric Alert Threshold Description
Request latency (p99) > 500ms API responsiveness
Event store push latency > 100ms Write path health
Projection lag > 1000 events Read model freshness
PostgreSQL connections > 80% max Connection pool saturation
Redis hit rate < 90% Cache effectiveness
Login error rate > 1% Authentication failures

Upgrades

Kubernetes Upgrade Process

# Update Helm repo
helm repo update zitadel

# Dry-run to verify
helm upgrade zitadel zitadel/zitadel \
  --namespace zitadel \
  --values values.yaml \
  --dry-run

# Perform upgrade (Setup Job handles migrations)
helm upgrade zitadel zitadel/zitadel \
  --namespace zitadel \
  --values values.yaml

The upgrade process: 1. New image tag triggers a Setup Job for database migrations 2. Migrations are backwards-compatible where possible 3. Rolling deployment updates pods one at a time 4. Zero downtime is supported for minor version upgrades

Version Upgrade Compatibility

Upgrade Path Notes
v4.x to v4.x+1 Rolling update, zero downtime
v3.x to v4.x Requires CockroachDB to PostgreSQL migration
v4.x to v5.x Upcoming, migration guide expected

Backup & Recovery

Database Backup

# PostgreSQL logical backup
pg_dump -h postgres-host -U zitadel -d zitadel > zitadel-backup.sql

# Restore
psql -h postgres-host -U zitadel -d zitadel < zitadel-backup.sql

Event Store Recovery

Because all state is derived from events, recovery is: 1. Restore PostgreSQL from backup 2. Zitadel replays unprocessed events through projections 3. Read models are rebuilt to match the event log

Master Key Rotation

Master Key Rotation

Rotating the master key requires re-encrypting all stored secrets. Plan for downtime during rotation and test in staging first.

Common Issues

Issue Cause Resolution
Login redirects fail ExternalDomain misconfigured Verify domain matches the OIDC redirect URI
gRPC errors behind proxy Missing HTTP/2 support Enable http2 in NGINX/proxy config
Slow API responses Projection lag Check projection worker health, increase resources
connection refused to DB Network policy / service mesh Verify PostgreSQL is reachable from Zitadel pods
Token validation fails Clock skew between services Ensure NTP sync on all nodes
MFA enrollment loop Login policy conflicts Check org-level vs instance-level policy precedence

Sources