Skip to content

Operations

Scope

Production deployment patterns, high-availability setup, performance tuning, upgrade procedures, and common operational issues.

Production Deployment

High Availability Architecture

ArgoCD supports HA deployment with multiple replicas of each component:

Component Replicas Notes
argocd-server 2+ Stateless, load-balanced via Ingress
argocd-repo-server 2+ CPU-intensive; scales with repo count
argocd-application-controller 1 (sharded) Leader election; shard across clusters
argocd-redis 1 (HA optional) Use Redis Sentinel or Redis Cluster for HA
argocd-dex-server 2+ Stateless SSO proxy
argocd-applicationset-controller 1 Leader election
# Install HA manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

Controller Sharding

For multi-cluster deployments (50+ clusters), shard the application controller:

# argocd-cmd-params-cm ConfigMap
data:
  controller.sharding.algorithm: "round-robin"  # or "legacy"
  controller.replicas: "3"

Each shard handles a subset of clusters. The controller uses a ConfigMap to coordinate shard assignment.

Performance Tuning

Repo Server Optimization

Parameter Default Recommended Impact
ARGOCD_EXEC_TIMEOUT 90s 180s Large Helm charts timeout
reposerver.parallelism.limit 0 (unlimited) 10 Prevents CPU spikes
ARGOCD_GIT_ATTEMPTS_COUNT 1 3 Retries on transient failures
server.repo.server.timeout.seconds 60 120 Large monorepo sync

Redis Tuning

# For large deployments (1000+ Applications)
data:
  redis.server: "argocd-redis-ha-haproxy:6379"
  redis.compression: "gzip"  # Reduces memory by ~40%

Monorepo Performance

Monorepos with 10k+ files degrade performance significantly:

  1. Enable Git sparse checkout: Set ARGOCD_GIT_LS_REMOTE_PARALLELISM=3
  2. Use webhook-driven sync instead of polling
  3. Configure resource.exclusions to skip irrelevant namespaces
  4. Set timeout.reconciliation: 300s (up from default 180s)

Operational Procedures

Upgrade Strategy

Breaking Changes

Always check the upgrade guide before upgrading. CRD changes may require manual migration.

# 1. Backup current state
kubectl get applications -n argocd -o yaml > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > projects-backup.yaml

# 2. Apply new manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.13.0/manifests/ha/install.yaml

# 3. Verify health
argocd admin dashboard  # Check UI
argocd app list         # Verify all apps synced

Disaster Recovery

  • Export all Applications: argocd admin export > backup.yaml
  • Declarative setup: Store all Application CRDs in Git (recommended)
  • Redis persistence: Not critical if Apps are declarative — Redis is a cache

Common Issues

Issue Root Cause Resolution
Sync stuck in Progressing Resource health check failing Check resource.customizations.health
ComparisonError Manifest generation timeout Increase reposerver.timeout
High memory on controller Too many watched resources Enable controller sharding
Webhook not triggering Secret mismatch Verify webhook secret in argocd-secret
SSO login loop Dex callback URL mismatch Check url in argocd-cm

Monitoring

Key Metrics (Prometheus)

# Sync duration per app
histogram_quantile(0.95, argocd_app_sync_total)

# Controller queue depth (should be near 0)
argocd_app_reconcile_count

# Repo server active requests
argocd_git_request_total

# Application health status
argocd_app_info{health_status="Degraded"}

Alerting Rules

- alert: ArgoCDAppOutOfSync
  expr: argocd_app_info{sync_status="OutOfSync"} == 1
  for: 30m
  labels:
    severity: warning

- alert: ArgoCDHighReconcileQueue
  expr: argocd_app_reconcile_count > 100
  for: 10m
  labels:
    severity: critical

Resource Requirements

Deployment Size Apps Clusters Controller CPU/Memory Repo Server CPU/Memory
Small < 50 1-3 500m / 512Mi 500m / 512Mi
Medium 50-200 3-10 2 / 2Gi 2 / 2Gi
Large 200-1000 10-50 4 / 4Gi (sharded) 4 / 4Gi (3 replicas)
Enterprise 1000+ 50+ 8 / 8Gi (multi-shard) 8 / 8Gi (5+ replicas)

Commands & Recipes

Installation

# Install ArgoCD on K8s
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Install CLI
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd

# Get initial admin password
argocd admin initial-password -n argocd

# Login
argocd login localhost:8080

Application Management

# Create application
argocd app create myapp \
  --repo https://github.com/org/repo.git \
  --path k8s/overlays/production \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace production \
  --sync-policy automated \
  --auto-prune --self-heal

# Sync (manual)
argocd app sync myapp

# Sync with specific revision
argocd app sync myapp --revision feature-branch

# View app status
argocd app get myapp
argocd app diff myapp

# Rollback
argocd app rollback myapp <history-id>

# Delete (with PreDelete hooks)
argocd app delete myapp --cascade

Multi-Cluster

# Add target cluster
argocd cluster add my-context --name production-cluster

# List clusters
argocd cluster list

ApplicationSet

# Deploy to all clusters from monorepo directories
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  generators:
    - matrix:
        generators:
          - clusters: {}
          - git:
              repoURL: https://github.com/org/infra.git
              revision: HEAD
              directories:
                - path: addons/*
  template:
    metadata:
      name: "{{name}}-{{path.basename}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/infra.git
        targetRevision: HEAD
        path: "{{path}}"
      destination:
        server: "{{server}}"
        namespace: "{{path.basename}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Sources