Operations¶
Scope
Production deployment patterns, high-availability setup, performance tuning, upgrade procedures, and common operational issues.
Production Deployment¶
High Availability Architecture¶
ArgoCD supports HA deployment with multiple replicas of each component:
| Component | Replicas | Notes |
|---|---|---|
argocd-server |
2+ | Stateless, load-balanced via Ingress |
argocd-repo-server |
2+ | CPU-intensive; scales with repo count |
argocd-application-controller |
1 (sharded) | Leader election; shard across clusters |
argocd-redis |
1 (HA optional) | Use Redis Sentinel or Redis Cluster for HA |
argocd-dex-server |
2+ | Stateless SSO proxy |
argocd-applicationset-controller |
1 | Leader election |
# Install HA manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml
Controller Sharding¶
For multi-cluster deployments (50+ clusters), shard the application controller:
# argocd-cmd-params-cm ConfigMap
data:
controller.sharding.algorithm: "round-robin" # or "legacy"
controller.replicas: "3"
Each shard handles a subset of clusters. The controller uses a ConfigMap to coordinate shard assignment.
Performance Tuning¶
Repo Server Optimization¶
| Parameter | Default | Recommended | Impact |
|---|---|---|---|
ARGOCD_EXEC_TIMEOUT |
90s | 180s | Large Helm charts timeout |
reposerver.parallelism.limit |
0 (unlimited) | 10 | Prevents CPU spikes |
ARGOCD_GIT_ATTEMPTS_COUNT |
1 | 3 | Retries on transient failures |
server.repo.server.timeout.seconds |
60 | 120 | Large monorepo sync |
Redis Tuning¶
# For large deployments (1000+ Applications)
data:
redis.server: "argocd-redis-ha-haproxy:6379"
redis.compression: "gzip" # Reduces memory by ~40%
Monorepo Performance¶
Monorepos with 10k+ files degrade performance significantly:
- Enable Git sparse checkout: Set
ARGOCD_GIT_LS_REMOTE_PARALLELISM=3 - Use webhook-driven sync instead of polling
- Configure
resource.exclusionsto skip irrelevant namespaces - Set
timeout.reconciliation: 300s(up from default 180s)
Operational Procedures¶
Upgrade Strategy¶
Breaking Changes
Always check the upgrade guide before upgrading. CRD changes may require manual migration.
# 1. Backup current state
kubectl get applications -n argocd -o yaml > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > projects-backup.yaml
# 2. Apply new manifests
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.13.0/manifests/ha/install.yaml
# 3. Verify health
argocd admin dashboard # Check UI
argocd app list # Verify all apps synced
Disaster Recovery¶
- Export all Applications:
argocd admin export > backup.yaml - Declarative setup: Store all Application CRDs in Git (recommended)
- Redis persistence: Not critical if Apps are declarative — Redis is a cache
Common Issues¶
| Issue | Root Cause | Resolution |
|---|---|---|
Sync stuck in Progressing |
Resource health check failing | Check resource.customizations.health |
ComparisonError |
Manifest generation timeout | Increase reposerver.timeout |
| High memory on controller | Too many watched resources | Enable controller sharding |
| Webhook not triggering | Secret mismatch | Verify webhook secret in argocd-secret |
| SSO login loop | Dex callback URL mismatch | Check url in argocd-cm |
Monitoring¶
Key Metrics (Prometheus)¶
# Sync duration per app
histogram_quantile(0.95, argocd_app_sync_total)
# Controller queue depth (should be near 0)
argocd_app_reconcile_count
# Repo server active requests
argocd_git_request_total
# Application health status
argocd_app_info{health_status="Degraded"}
Alerting Rules¶
- alert: ArgoCDAppOutOfSync
expr: argocd_app_info{sync_status="OutOfSync"} == 1
for: 30m
labels:
severity: warning
- alert: ArgoCDHighReconcileQueue
expr: argocd_app_reconcile_count > 100
for: 10m
labels:
severity: critical
Resource Requirements¶
| Deployment Size | Apps | Clusters | Controller CPU/Memory | Repo Server CPU/Memory |
|---|---|---|---|---|
| Small | < 50 | 1-3 | 500m / 512Mi | 500m / 512Mi |
| Medium | 50-200 | 3-10 | 2 / 2Gi | 2 / 2Gi |
| Large | 200-1000 | 10-50 | 4 / 4Gi (sharded) | 4 / 4Gi (3 replicas) |
| Enterprise | 1000+ | 50+ | 8 / 8Gi (multi-shard) | 8 / 8Gi (5+ replicas) |
Commands & Recipes¶
Installation¶
# Install ArgoCD on K8s
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Install CLI
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
# Get initial admin password
argocd admin initial-password -n argocd
# Login
argocd login localhost:8080
Application Management¶
# Create application
argocd app create myapp \
--repo https://github.com/org/repo.git \
--path k8s/overlays/production \
--dest-server https://kubernetes.default.svc \
--dest-namespace production \
--sync-policy automated \
--auto-prune --self-heal
# Sync (manual)
argocd app sync myapp
# Sync with specific revision
argocd app sync myapp --revision feature-branch
# View app status
argocd app get myapp
argocd app diff myapp
# Rollback
argocd app rollback myapp <history-id>
# Delete (with PreDelete hooks)
argocd app delete myapp --cascade
Multi-Cluster¶
# Add target cluster
argocd cluster add my-context --name production-cluster
# List clusters
argocd cluster list
ApplicationSet¶
# Deploy to all clusters from monorepo directories
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-addons
namespace: argocd
spec:
generators:
- matrix:
generators:
- clusters: {}
- git:
repoURL: https://github.com/org/infra.git
revision: HEAD
directories:
- path: addons/*
template:
metadata:
name: "{{name}}-{{path.basename}}"
spec:
project: default
source:
repoURL: https://github.com/org/infra.git
targetRevision: HEAD
path: "{{path}}"
destination:
server: "{{server}}"
namespace: "{{path.basename}}"
syncPolicy:
automated:
prune: true
selfHeal: true