ArgoCD — Operations¶
Scope
Production deployment patterns, high-availability setup, performance tuning, upgrade procedures, and common operational issues.
Production Deployment¶
High Availability Architecture¶
ArgoCD supports HA deployment with multiple replicas of each component:
| Component | Replicas | Notes |
|---|---|---|
argocd-server |
2+ | Stateless, load-balanced via Ingress |
argocd-repo-server |
2+ | CPU-intensive; scales with repo count |
argocd-application-controller |
1 (sharded) | Leader election; shard across clusters |
argocd-redis |
1 (HA optional) | Use Redis Sentinel or Redis Cluster for HA |
argocd-dex-server |
2+ | Stateless SSO proxy |
argocd-applicationset-controller |
1 | Leader election |
# Install HA manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml
Controller Sharding¶
For multi-cluster deployments (50+ clusters), shard the application controller:
# argocd-cmd-params-cm ConfigMap
data:
controller.sharding.algorithm: "round-robin" # or "legacy"
controller.replicas: "3"
Each shard handles a subset of clusters. The controller uses a ConfigMap to coordinate shard assignment.
Performance Tuning¶
Repo Server Optimization¶
| Parameter | Default | Recommended | Impact |
|---|---|---|---|
ARGOCD_EXEC_TIMEOUT |
90s | 180s | Large Helm charts timeout |
reposerver.parallelism.limit |
0 (unlimited) | 10 | Prevents CPU spikes |
ARGOCD_GIT_ATTEMPTS_COUNT |
1 | 3 | Retries on transient failures |
server.repo.server.timeout.seconds |
60 | 120 | Large monorepo sync |
Redis Tuning¶
# For large deployments (1000+ Applications)
data:
redis.server: "argocd-redis-ha-haproxy:6379"
redis.compression: "gzip" # Reduces memory by ~40%
Monorepo Performance¶
Monorepos with 10k+ files degrade performance significantly:
- Enable Git sparse checkout: Set
ARGOCD_GIT_LS_REMOTE_PARALLELISM=3 - Use webhook-driven sync instead of polling
- Configure
resource.exclusionsto skip irrelevant namespaces - Set
timeout.reconciliation: 300s(up from default 180s)
Operational Procedures¶
Upgrade Strategy¶
Breaking Changes
Always check the upgrade guide before upgrading. CRD changes may require manual migration.
# 1. Backup current state
kubectl get applications -n argocd -o yaml > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > projects-backup.yaml
# 2. Apply new manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.13.0/manifests/ha/install.yaml
# 3. Verify health
argocd admin dashboard # Check UI
argocd app list # Verify all apps synced
Disaster Recovery¶
- Export all Applications:
argocd admin export > backup.yaml - Declarative setup: Store all Application CRDs in Git (recommended)
- Redis persistence: Not critical if Apps are declarative — Redis is a cache
Common Issues¶
| Issue | Root Cause | Resolution |
|---|---|---|
Sync stuck in Progressing |
Resource health check failing | Check resource.customizations.health |
ComparisonError |
Manifest generation timeout | Increase reposerver.timeout |
| High memory on controller | Too many watched resources | Enable controller sharding |
| Webhook not triggering | Secret mismatch | Verify webhook secret in argocd-secret |
| SSO login loop | Dex callback URL mismatch | Check url in argocd-cm |
Monitoring¶
Key Metrics (Prometheus)¶
# Sync duration per app
histogram_quantile(0.95, argocd_app_sync_total)
# Controller queue depth (should be near 0)
argocd_app_reconcile_count
# Repo server active requests
argocd_git_request_total
# Application health status
argocd_app_info{health_status="Degraded"}
Alerting Rules¶
- alert: ArgoCDAppOutOfSync
expr: argocd_app_info{sync_status="OutOfSync"} == 1
for: 30m
labels:
severity: warning
- alert: ArgoCDHighReconcileQueue
expr: argocd_app_reconcile_count > 100
for: 10m
labels:
severity: critical
Resource Requirements¶
| Deployment Size | Apps | Clusters | Controller CPU/Memory | Repo Server CPU/Memory |
|---|---|---|---|---|
| Small | < 50 | 1-3 | 500m / 512Mi | 500m / 512Mi |
| Medium | 50-200 | 3-10 | 2 / 2Gi | 2 / 2Gi |
| Large | 200-1000 | 10-50 | 4 / 4Gi (sharded) | 4 / 4Gi (3 replicas) |
| Enterprise | 1000+ | 50+ | 8 / 8Gi (multi-shard) | 8 / 8Gi (5+ replicas) |