Kubernetes — Benchmarks¶

Scope

Kubernetes scalability limits, API server performance, etcd throughput, and scheduling benchmarks.

Official Scalability Targets (SIG Scalability)¶

Kubernetes officially tests and targets these limits per cluster:

Dimension	Target	Notes
Nodes	5,000	Tested by SIG Scalability
Pods	150,000	30 pods/node avg
Pods per node	110	Kubelet default `maxPods`
Services	10,000
Endpoints per Service	5,000	Beyond this, use EndpointSlices
Namespaces	10,000
ConfigMaps	30,000
Secrets	30,000
Total API objects	~300,000	etcd storage limit

API Server Performance¶

Metric	Target SLO	Notes
API request latency (mutating, P99)	< 1s	At 5000-node scale
API request latency (non-mutating, P99)	< 5s	For resource-list calls
API request latency (P50)	< 100ms	Typical operations
Startup latency (P99)	< 5s	Pod ready from API call

etcd Performance¶

Cluster Size	WAL fsync P99	Read latency P99	Write QPS	Storage
< 100 nodes	< 5ms	< 10ms	1,000	2Gi
100-500 nodes	< 10ms	< 25ms	5,000	4Gi
500-5000 nodes	< 10ms	< 50ms	10,000	8Gi

Disk Latency is Critical

etcd requires sequential writes with fsync. Any disk with > 10ms fsync latency will cause leader elections, cluster instability, and cascading failures.

Scheduling Performance¶

Scheduler Metric	Value	Conditions
Scheduling throughput	~100 pods/sec	Default scheduler, 5000-node cluster
Scheduling latency (P99)	< 100ms	Without complex affinity rules
Scheduling with affinity	20-50 pods/sec	Pod anti-affinity across nodes
Preemption overhead	+50-100ms	When preemption kicks in

Network Performance (CNI Comparison)¶

CNI	Pod-to-Pod Latency	Throughput (TCP)	Throughput (eBPF)	Encryption Overhead
Cilium	~50us	9.5 Gbps	9.8 Gbps (native)	15-20% (WireGuard)
Calico	~60us	9.2 Gbps	9.5 Gbps (eBPF mode)	20-25% (WireGuard)
Flannel (VXLAN)	~80us	8.5 Gbps	N/A	N/A (no native)
Host networking	~30us	10 Gbps	N/A	N/A

Real-World Scale References¶

Google GKE: Supports 15,000 nodes per cluster (managed)
AWS EKS: Up to 5,000 nodes with managed control plane
OpenAI: Runs 7,500-node clusters for ML training
Alibaba Cloud: Reported testing at 10,000+ nodes