ArgoCD — Operations¶

Scope

Production deployment patterns, high-availability setup, performance tuning, upgrade procedures, and common operational issues.

Production Deployment¶

High Availability Architecture¶

ArgoCD supports HA deployment with multiple replicas of each component:

Component	Replicas	Notes
`argocd-server`	2+	Stateless, load-balanced via Ingress
`argocd-repo-server`	2+	CPU-intensive; scales with repo count
`argocd-application-controller`	1 (sharded)	Leader election; shard across clusters
`argocd-redis`	1 (HA optional)	Use Redis Sentinel or Redis Cluster for HA
`argocd-dex-server`	2+	Stateless SSO proxy
`argocd-applicationset-controller`	1	Leader election

# Install HA manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

Controller Sharding¶

For multi-cluster deployments (50+ clusters), shard the application controller:

# argocd-cmd-params-cm ConfigMap
data:
  controller.sharding.algorithm: "round-robin"  # or "legacy"
  controller.replicas: "3"

Each shard handles a subset of clusters. The controller uses a ConfigMap to coordinate shard assignment.

Performance Tuning¶

Repo Server Optimization¶

Parameter	Default	Recommended	Impact
`ARGOCD_EXEC_TIMEOUT`	90s	180s	Large Helm charts timeout
`reposerver.parallelism.limit`	0 (unlimited)	10	Prevents CPU spikes
`ARGOCD_GIT_ATTEMPTS_COUNT`	1	3	Retries on transient failures
`server.repo.server.timeout.seconds`	60	120	Large monorepo sync

Redis Tuning¶

# For large deployments (1000+ Applications)
data:
  redis.server: "argocd-redis-ha-haproxy:6379"
  redis.compression: "gzip"  # Reduces memory by ~40%

Monorepo Performance¶

Monorepos with 10k+ files degrade performance significantly:

Enable Git sparse checkout: Set ARGOCD_GIT_LS_REMOTE_PARALLELISM=3
Use webhook-driven sync instead of polling
Configure resource.exclusions to skip irrelevant namespaces
Set timeout.reconciliation: 300s (up from default 180s)

Operational Procedures¶

Upgrade Strategy¶

Breaking Changes

Always check the upgrade guide before upgrading. CRD changes may require manual migration.

# 1. Backup current state
kubectl get applications -n argocd -o yaml > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > projects-backup.yaml

# 2. Apply new manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.13.0/manifests/ha/install.yaml

# 3. Verify health
argocd admin dashboard  # Check UI
argocd app list         # Verify all apps synced

Disaster Recovery¶

Export all Applications: argocd admin export > backup.yaml
Declarative setup: Store all Application CRDs in Git (recommended)
Redis persistence: Not critical if Apps are declarative — Redis is a cache

Common Issues¶

Issue	Root Cause	Resolution
Sync stuck in `Progressing`	Resource health check failing	Check `resource.customizations.health`
`ComparisonError`	Manifest generation timeout	Increase `reposerver.timeout`
High memory on controller	Too many watched resources	Enable controller sharding
Webhook not triggering	Secret mismatch	Verify webhook secret in `argocd-secret`
SSO login loop	Dex callback URL mismatch	Check `url` in `argocd-cm`

Monitoring¶

Key Metrics (Prometheus)¶

# Sync duration per app
histogram_quantile(0.95, argocd_app_sync_total)

# Controller queue depth (should be near 0)
argocd_app_reconcile_count

# Repo server active requests
argocd_git_request_total

# Application health status
argocd_app_info{health_status="Degraded"}

Alerting Rules¶

- alert: ArgoCDAppOutOfSync
  expr: argocd_app_info{sync_status="OutOfSync"} == 1
  for: 30m
  labels:
    severity: warning

- alert: ArgoCDHighReconcileQueue
  expr: argocd_app_reconcile_count > 100
  for: 10m
  labels:
    severity: critical

Resource Requirements¶

Deployment Size	Apps	Clusters	Controller CPU/Memory	Repo Server CPU/Memory
Small	< 50	1-3	500m / 512Mi	500m / 512Mi
Medium	50-200	3-10	2 / 2Gi	2 / 2Gi
Large	200-1000	10-50	4 / 4Gi (sharded)	4 / 4Gi (3 replicas)
Enterprise	1000+	50+	8 / 8Gi (multi-shard)	8 / 8Gi (5+ replicas)