Skip to content

Kubernetes — Benchmarks

Scope

Kubernetes scalability limits, API server performance, etcd throughput, and scheduling benchmarks.

Official Scalability Targets (SIG Scalability)

Kubernetes officially tests and targets these limits per cluster:

Dimension Target Notes
Nodes 5,000 Tested by SIG Scalability
Pods 150,000 30 pods/node avg
Pods per node 110 Kubelet default maxPods
Services 10,000
Endpoints per Service 5,000 Beyond this, use EndpointSlices
Namespaces 10,000
ConfigMaps 30,000
Secrets 30,000
Total API objects ~300,000 etcd storage limit

API Server Performance

Metric Target SLO Notes
API request latency (mutating, P99) < 1s At 5000-node scale
API request latency (non-mutating, P99) < 5s For resource-list calls
API request latency (P50) < 100ms Typical operations
Startup latency (P99) < 5s Pod ready from API call

etcd Performance

Cluster Size WAL fsync P99 Read latency P99 Write QPS Storage
< 100 nodes < 5ms < 10ms 1,000 2Gi
100-500 nodes < 10ms < 25ms 5,000 4Gi
500-5000 nodes < 10ms < 50ms 10,000 8Gi

Disk Latency is Critical

etcd requires sequential writes with fsync. Any disk with > 10ms fsync latency will cause leader elections, cluster instability, and cascading failures.

Scheduling Performance

Scheduler Metric Value Conditions
Scheduling throughput ~100 pods/sec Default scheduler, 5000-node cluster
Scheduling latency (P99) < 100ms Without complex affinity rules
Scheduling with affinity 20-50 pods/sec Pod anti-affinity across nodes
Preemption overhead +50-100ms When preemption kicks in

Network Performance (CNI Comparison)

CNI Pod-to-Pod Latency Throughput (TCP) Throughput (eBPF) Encryption Overhead
Cilium ~50us 9.5 Gbps 9.8 Gbps (native) 15-20% (WireGuard)
Calico ~60us 9.2 Gbps 9.5 Gbps (eBPF mode) 20-25% (WireGuard)
Flannel (VXLAN) ~80us 8.5 Gbps N/A N/A (no native)
Host networking ~30us 10 Gbps N/A N/A

Real-World Scale References

  • Google GKE: Supports 15,000 nodes per cluster (managed)
  • AWS EKS: Up to 5,000 nodes with managed control plane
  • OpenAI: Runs 7,500-node clusters for ML training
  • Alibaba Cloud: Reported testing at 10,000+ nodes