Kubernetes¶
The industry-standard container orchestration platform for automating deployment, scaling, and management of containerized workloads across clusters of machines.
Overview¶
Kubernetes (K8s) is a production-grade container orchestration system originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF). It implements a desired-state model where controllers continuously reconcile actual state with declared intent. Kubernetes manages the full lifecycle of containerized applications: scheduling, scaling, networking, storage, and self-healing.
Repository & Community¶
| Attribute | Detail |
|---|---|
| Repository | github.com/kubernetes/kubernetes |
| Stars | ~115k+ ⭐ |
| Latest Stable | v1.35.3 (April 2026); v1.36 due April 22, 2026 |
| Language | Go |
| License | Apache 2.0 |
| Governance | CNCF (Linux Foundation) |
| Contributors | 9,000+ |
Evaluation¶
-
Why it's better: Cloud-agnostic, massive ecosystem (CNCF landscape), declarative config, self-healing, horizontal pod autoscaling, service discovery, rolling updates, and the dominant industry standard for container orchestration.
-
When it fits (Applicability):
- Microservices at scale across multiple nodes
- CI/CD with automated rollouts and rollbacks
- Multi-cloud / hybrid cloud portability
- Stateful workloads with persistent volumes
- AI/ML training and inference pipelines
-
Edge deployments (K3s, MicroK8s)
-
Pros and Cons:
| Pros | Cons |
|---|---|
| Cloud-agnostic, runs anywhere | Steep learning curve |
| Self-healing, auto-scaling | Complex networking (CNI plugins) |
| Massive CNCF ecosystem | Control plane overhead for small workloads |
| Declarative desired-state model | YAML verbosity |
| Service mesh, Ingress, Gateway API | Security hardening requires expertise |
| GPU/DRA scheduling for AI/ML | etcd operational complexity |
| Every major cloud offers managed K8s | Not ideal for traditional VM workloads |
Architecture¶
flowchart TB
subgraph ControlPlane["Control Plane"]
API["kube-apiserver\n(REST + gRPC)"]
ETCD["etcd\n(distributed KV store)"]
Sched["kube-scheduler\n(pod placement)"]
CM["kube-controller-manager\n(reconciliation loops)"]
CCM["cloud-controller-manager\n(cloud API integration)"]
end
subgraph WorkerNode["Worker Node"]
Kubelet["kubelet\n(pod lifecycle)"]
KProxy["kube-proxy\n(Service networking)"]
CRI["Container Runtime\n(containerd / CRI-O)"]
Pods["Pods\n(application containers)"]
end
API <-->|"watch/list"| ETCD
Sched -->|"bind pod"| API
CM -->|"reconcile"| API
CCM -->|"cloud ops"| API
Kubelet -->|"status"| API
API -->|"spec"| Kubelet
Kubelet -->|"CRI"| CRI
CRI --> Pods
KProxy -->|"iptables/IPVS"| Pods
style ControlPlane fill:#326ce5,color:#fff
style WorkerNode fill:#1565c0,color:#fff
Key Features¶
| Feature | Detail |
|---|---|
| Pod Scheduling | Affinity, anti-affinity, taints, tolerations, topology spread |
| Auto-Scaling | HPA (horizontal), VPA (vertical), Cluster Autoscaler |
| Service Discovery | ClusterIP, NodePort, LoadBalancer, ExternalName |
| Ingress / Gateway API | L7 traffic routing, TLS termination |
| Storage | PV, PVC, CSI drivers, StorageClasses |
| ConfigMaps / Secrets | Externalized configuration and credentials |
| RBAC | Fine-grained role-based access control |
| Namespaces | Logical cluster partitioning |
| DRA (v1.36) | Dynamic Resource Allocation for GPUs, FPGAs |
| Custom Resources | Extend API with CRDs + Operators |
Key Ecosystem¶
| Category | Tools |
|---|---|
| Managed K8s | EKS, GKE, AKS, DOKS, OKE, Linode LKE |
| Lightweight | K3s, MicroK8s, Kind, Minikube |
| Networking | Calico, Cilium, Flannel, Antrea |
| Service Mesh | Istio, Linkerd, Consul Connect |
| GitOps | ArgoCD, Flux |
| Observability | Prometheus, Grafana, OpenTelemetry |
| Security | Falco, OPA/Gatekeeper, Trivy, Kyverno |
Pricing¶
| Offering | Cost | Notes |
|---|---|---|
| Self-hosted | Free (Apache 2.0) | You manage everything |
| AWS EKS | $0.10/hr/cluster + node costs | Managed control plane |
| GKE Autopilot | $0.10/hr/cluster + pod costs | Fully managed |
| Azure AKS | Free control plane + node costs | Managed |
| Enterprise | Various (Rancher, OpenShift, Tanzu) | Support + add-ons |
Compatibility¶
| Dimension | Support |
|---|---|
| Container runtimes | containerd (default), CRI-O |
| Node OS | Linux (primary), Windows (worker nodes) |
| CPU architecture | amd64, arm64, arm/v7, s390x, ppc64le |
| Storage | CSI (Ceph, EBS, GCE PD, Azure Disk, NFS, etc.) |
| Networking | CNI plugins (Calico, Cilium, Flannel, etc.) |
| Infrastructure | Bare metal, VMs, any cloud, edge |
Scale Limits (Upstream)¶
| Dimension | Limit |
|---|---|
| Nodes per cluster | 5,000 |
| Pods per node | 110 (default) |
| Pods per cluster | 150,000 |
| Services per cluster | 10,000 |
| Namespaces per cluster | 10,000 |
Related Topics¶
Sources¶
| Source | URL | Retrieved Via |
|---|---|---|
| Official Website | https://kubernetes.io | Direct |
| Documentation | https://kubernetes.io/docs/ | Direct |
| GitHub Repository | https://github.com/kubernetes/kubernetes | Direct |
| Releases | https://kubernetes.io/releases/ | Web Search |
| Release Cycle | https://kubernetes.dev | Web Search |
| API Reference | https://kubernetes.io/docs/reference/kubernetes-api/ | Direct |
| CNCF | https://cncf.io | Direct |
| CNCF Landscape | https://landscape.cncf.io | Direct |
| Scalability Targets | https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md | Direct |
| Ingress NGINX Retirement | https://kubernetes.io/blog/ | Web Search |
Questions¶
Open Questions¶
Answered Questions¶
-
How does v1.36 SELinuxMount GA affect pod startup latency in enforcing environments? — SELinuxMount (KEP-1710, alpha in v1.28, GA in v1.30+) mounts volumes with the correct SELinux label via mount options (
context=...) at mount time rather than recursively relabeling each file. In enforcing environments with large volumes (10K+ files), this reduces pod startup latency from minutes to seconds. The kernel applies the label at mount time, eliminating the per-filesetxattrsyscall overhead. Benchmarks show 10-100x improvement for volumes with many small files. No benchmark data specific to v1.36 yet, as SELinuxMount was already GA before v1.36. — resolved via KEP-1710 documentation -
What is the production readiness of DRA (Dynamic Resource Allocation) for GPU partitioning in v1.36? — DRA (KEP-306) moved to Beta in v1.30 with structured parameters. As of v1.31/v1.32, it remains Beta and is not yet GA. The NVIDIA DRA driver (nvidia/k8s-dra-driver) supports MIG partitioning and time-slicing but is also experimental. For production GPU workloads today, the traditional NVIDIA device plugin with MIG/time-slicing remains the recommended path. DRA is expected to graduate to GA in v1.33 or later. — resolved via KEP-306 documentation
- How does the Gateway API adoption compare to the retired ingress-nginx in production environments? — Gateway API is the official successor to Ingress. ingress-nginx was retired March 2026. Multiple production-grade controllers exist: Envoy Gateway (CNCF), Istio Gateway, Cilium Gateway, Kong Gateway. Gateway API provides richer routing (traffic splitting, header matching, weighted backends) and role-oriented design (ClusterOperator, InfrastructureProvider, ApplicationDeveloper). Migration from Ingress is straightforward for basic use cases but requires rethinking for advanced patterns. — resolved via Kubernetes Gateway API documentation
-
What are the recommended etcd backup strategies for clusters with >10,000 objects? — (1) Periodic snapshots via
etcdctl snapshot saveevery 6-12 hours; (2) WAL archiving with--snapshot-count=10000to reduce snapshot frequency; (3) Use etcd's built-in compaction (etcdctl compact) to prevent unlimited growth; (4) Store snapshots in external object storage (S3/GCS); (5) For large clusters, use etcddefragduring maintenance windows; (6) Test restore regularly on a separate cluster. Etcd recommends keeping last 2-3 snapshots. — resolved via etcd documentation -
What is the max cluster size? → 5,000 nodes, 150,000 pods, 10,000 services (upstream thresholds). See infrastructure/kubernetes/index#Scale Limits (Upstream).
- Is Docker still supported? → Dockershim was removed in v1.24. containerd and CRI-O are the supported runtimes.
- What happened to ingress-nginx? → Retired March 24, 2026. Migrate to Gateway API controllers.
- How does scheduling work? → Filter → Score → Bind. See infrastructure/kubernetes/architecture#How It Works.