Skip to content

ArgoCD — Operations

Scope

Production deployment patterns, high-availability setup, performance tuning, upgrade procedures, and common operational issues.

Production Deployment

High Availability Architecture

ArgoCD supports HA deployment with multiple replicas of each component:

Component Replicas Notes
argocd-server 2+ Stateless, load-balanced via Ingress
argocd-repo-server 2+ CPU-intensive; scales with repo count
argocd-application-controller 1 (sharded) Leader election; shard across clusters
argocd-redis 1 (HA optional) Use Redis Sentinel or Redis Cluster for HA
argocd-dex-server 2+ Stateless SSO proxy
argocd-applicationset-controller 1 Leader election
# Install HA manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

Controller Sharding

For multi-cluster deployments (50+ clusters), shard the application controller:

# argocd-cmd-params-cm ConfigMap
data:
  controller.sharding.algorithm: "round-robin"  # or "legacy"
  controller.replicas: "3"

Each shard handles a subset of clusters. The controller uses a ConfigMap to coordinate shard assignment.

Performance Tuning

Repo Server Optimization

Parameter Default Recommended Impact
ARGOCD_EXEC_TIMEOUT 90s 180s Large Helm charts timeout
reposerver.parallelism.limit 0 (unlimited) 10 Prevents CPU spikes
ARGOCD_GIT_ATTEMPTS_COUNT 1 3 Retries on transient failures
server.repo.server.timeout.seconds 60 120 Large monorepo sync

Redis Tuning

# For large deployments (1000+ Applications)
data:
  redis.server: "argocd-redis-ha-haproxy:6379"
  redis.compression: "gzip"  # Reduces memory by ~40%

Monorepo Performance

Monorepos with 10k+ files degrade performance significantly:

  1. Enable Git sparse checkout: Set ARGOCD_GIT_LS_REMOTE_PARALLELISM=3
  2. Use webhook-driven sync instead of polling
  3. Configure resource.exclusions to skip irrelevant namespaces
  4. Set timeout.reconciliation: 300s (up from default 180s)

Operational Procedures

Upgrade Strategy

Breaking Changes

Always check the upgrade guide before upgrading. CRD changes may require manual migration.

# 1. Backup current state
kubectl get applications -n argocd -o yaml > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > projects-backup.yaml

# 2. Apply new manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.13.0/manifests/ha/install.yaml

# 3. Verify health
argocd admin dashboard  # Check UI
argocd app list         # Verify all apps synced

Disaster Recovery

  • Export all Applications: argocd admin export > backup.yaml
  • Declarative setup: Store all Application CRDs in Git (recommended)
  • Redis persistence: Not critical if Apps are declarative — Redis is a cache

Common Issues

Issue Root Cause Resolution
Sync stuck in Progressing Resource health check failing Check resource.customizations.health
ComparisonError Manifest generation timeout Increase reposerver.timeout
High memory on controller Too many watched resources Enable controller sharding
Webhook not triggering Secret mismatch Verify webhook secret in argocd-secret
SSO login loop Dex callback URL mismatch Check url in argocd-cm

Monitoring

Key Metrics (Prometheus)

# Sync duration per app
histogram_quantile(0.95, argocd_app_sync_total)

# Controller queue depth (should be near 0)
argocd_app_reconcile_count

# Repo server active requests
argocd_git_request_total

# Application health status
argocd_app_info{health_status="Degraded"}

Alerting Rules

- alert: ArgoCDAppOutOfSync
  expr: argocd_app_info{sync_status="OutOfSync"} == 1
  for: 30m
  labels:
    severity: warning

- alert: ArgoCDHighReconcileQueue
  expr: argocd_app_reconcile_count > 100
  for: 10m
  labels:
    severity: critical

Resource Requirements

Deployment Size Apps Clusters Controller CPU/Memory Repo Server CPU/Memory
Small < 50 1-3 500m / 512Mi 500m / 512Mi
Medium 50-200 3-10 2 / 2Gi 2 / 2Gi
Large 200-1000 10-50 4 / 4Gi (sharded) 4 / 4Gi (3 replicas)
Enterprise 1000+ 50+ 8 / 8Gi (multi-shard) 8 / 8Gi (5+ replicas)