Kubernetes — How It Works¶
Desired-state reconciliation, pod lifecycle, scheduling, networking, and storage internals.
Desired-State Reconciliation Loop¶
sequenceDiagram
participant User as User / CI
participant API as kube-apiserver
participant ETCD as etcd
participant Ctrl as Controller Manager
participant Sched as Scheduler
participant KL as kubelet (Node)
participant CRI as containerd
User->>API: kubectl apply -f deployment.yaml
API->>ETCD: Store desired state
Ctrl->>API: Watch: new Deployment
Ctrl->>API: Create ReplicaSet
Ctrl->>API: Create Pod specs
Sched->>API: Watch: unscheduled Pods
Sched->>Sched: Score nodes (resources, affinity, taints)
Sched->>API: Bind Pod → Node
KL->>API: Watch: Pod assigned to my node
KL->>CRI: Create container sandbox
CRI->>CRI: Pull image, start containers
KL->>API: Update Pod status: Running
loop Reconciliation
Ctrl->>API: Watch: actual vs desired state
Ctrl->>Ctrl: If replicas < desired → create more Pods
Ctrl->>Ctrl: If replicas > desired → delete surplus
end
Pod Lifecycle¶
stateDiagram-v2
[*] --> Pending: Pod created
Pending --> Running: Container started
Running --> Succeeded: All containers exit 0
Running --> Failed: Container exits non-zero
Running --> Unknown: Node unreachable
Failed --> [*]: Not restarted
Succeeded --> [*]: Job complete
Unknown --> Running: Node recovers
Unknown --> Failed: Node dead (grace period)
state Running {
[*] --> Init: Init containers run sequentially
Init --> Ready: Readiness probe passes
Ready --> NotReady: Readiness probe fails
NotReady --> Ready: Probe passes again
}
Networking Model¶
The 4 Networking Rules¶
- Pod-to-Pod: Every Pod gets its own IP. All Pods can communicate without NAT.
- Pod-to-Service: Services provide stable virtual IPs (ClusterIP) backed by iptables/IPVS rules.
- External-to-Service: LoadBalancer, NodePort, or Ingress/Gateway API expose services.
- Pod-to-External: Pods can reach external networks via SNAT.
flowchart TB
subgraph Cluster["Kubernetes Cluster"]
subgraph Node1["Node 1"]
P1["Pod A\n10.244.1.2"]
P2["Pod B\n10.244.1.3"]
KP1["kube-proxy\n(iptables/IPVS)"]
end
subgraph Node2["Node 2"]
P3["Pod C\n10.244.2.2"]
P4["Pod D\n10.244.2.3"]
KP2["kube-proxy"]
end
SVC["Service: my-svc\nClusterIP: 10.96.0.10\n→ Pod A, Pod C"]
CNI["CNI Plugin\n(Calico/Cilium/Flannel)\nPod-to-Pod routing"]
end
External["External\nTraffic"] -->|"LoadBalancer /\nNodePort"| SVC
SVC -->|"iptables rules"| P1
SVC -->|"iptables rules"| P3
P1 <-->|"CNI overlay"| P3
P2 <-->|"CNI overlay"| P4
style Cluster fill:#326ce5,color:#fff
Storage Architecture¶
flowchart LR
Pod["Pod"] --> PVC["PersistentVolumeClaim\n(request: 10Gi)"]
PVC --> PV["PersistentVolume\n(10Gi, RWO)"]
PV --> SC["StorageClass\n(provisioner: ebs.csi)"]
SC --> CSI["CSI Driver\n(EBS, GCE PD, Ceph)"]
CSI --> Disk["Cloud Disk\nor Storage"]
style PVC fill:#326ce5,color:#fff
Scheduling Algorithm¶
| Phase | Operation |
|---|---|
| Filtering | Eliminate nodes that don't meet Pod requirements (resources, taints, affinity) |
| Scoring | Rank remaining nodes: LeastRequestedPriority, BalancedResourceAllocation, NodeAffinityPriority, PodTopologySpread |
| Binding | Assign Pod to highest-scoring node |
| Preemption | If no node fits, evict lower-priority Pods |