Skip to content

Kubernetes — How It Works

Desired-state reconciliation, pod lifecycle, scheduling, networking, and storage internals.

Desired-State Reconciliation Loop

sequenceDiagram
    participant User as User / CI
    participant API as kube-apiserver
    participant ETCD as etcd
    participant Ctrl as Controller Manager
    participant Sched as Scheduler
    participant KL as kubelet (Node)
    participant CRI as containerd

    User->>API: kubectl apply -f deployment.yaml
    API->>ETCD: Store desired state
    Ctrl->>API: Watch: new Deployment
    Ctrl->>API: Create ReplicaSet
    Ctrl->>API: Create Pod specs
    Sched->>API: Watch: unscheduled Pods
    Sched->>Sched: Score nodes (resources, affinity, taints)
    Sched->>API: Bind Pod → Node
    KL->>API: Watch: Pod assigned to my node
    KL->>CRI: Create container sandbox
    CRI->>CRI: Pull image, start containers
    KL->>API: Update Pod status: Running

    loop Reconciliation
        Ctrl->>API: Watch: actual vs desired state
        Ctrl->>Ctrl: If replicas < desired → create more Pods
        Ctrl->>Ctrl: If replicas > desired → delete surplus
    end

Pod Lifecycle

stateDiagram-v2
    [*] --> Pending: Pod created
    Pending --> Running: Container started
    Running --> Succeeded: All containers exit 0
    Running --> Failed: Container exits non-zero
    Running --> Unknown: Node unreachable
    Failed --> [*]: Not restarted
    Succeeded --> [*]: Job complete
    Unknown --> Running: Node recovers
    Unknown --> Failed: Node dead (grace period)

    state Running {
        [*] --> Init: Init containers run sequentially
        Init --> Ready: Readiness probe passes
        Ready --> NotReady: Readiness probe fails
        NotReady --> Ready: Probe passes again
    }

Networking Model

The 4 Networking Rules

  1. Pod-to-Pod: Every Pod gets its own IP. All Pods can communicate without NAT.
  2. Pod-to-Service: Services provide stable virtual IPs (ClusterIP) backed by iptables/IPVS rules.
  3. External-to-Service: LoadBalancer, NodePort, or Ingress/Gateway API expose services.
  4. Pod-to-External: Pods can reach external networks via SNAT.
flowchart TB
    subgraph Cluster["Kubernetes Cluster"]
        subgraph Node1["Node 1"]
            P1["Pod A\n10.244.1.2"]
            P2["Pod B\n10.244.1.3"]
            KP1["kube-proxy\n(iptables/IPVS)"]
        end

        subgraph Node2["Node 2"]
            P3["Pod C\n10.244.2.2"]
            P4["Pod D\n10.244.2.3"]
            KP2["kube-proxy"]
        end

        SVC["Service: my-svc\nClusterIP: 10.96.0.10\n→ Pod A, Pod C"]

        CNI["CNI Plugin\n(Calico/Cilium/Flannel)\nPod-to-Pod routing"]
    end

    External["External\nTraffic"] -->|"LoadBalancer /\nNodePort"| SVC
    SVC -->|"iptables rules"| P1
    SVC -->|"iptables rules"| P3
    P1 <-->|"CNI overlay"| P3
    P2 <-->|"CNI overlay"| P4

    style Cluster fill:#326ce5,color:#fff

Storage Architecture

flowchart LR
    Pod["Pod"] --> PVC["PersistentVolumeClaim\n(request: 10Gi)"]
    PVC --> PV["PersistentVolume\n(10Gi, RWO)"]
    PV --> SC["StorageClass\n(provisioner: ebs.csi)"]
    SC --> CSI["CSI Driver\n(EBS, GCE PD, Ceph)"]
    CSI --> Disk["Cloud Disk\nor Storage"]

    style PVC fill:#326ce5,color:#fff

Scheduling Algorithm

Phase Operation
Filtering Eliminate nodes that don't meet Pod requirements (resources, taints, affinity)
Scoring Rank remaining nodes: LeastRequestedPriority, BalancedResourceAllocation, NodeAffinityPriority, PodTopologySpread
Binding Assign Pod to highest-scoring node
Preemption If no node fits, evict lower-priority Pods

Sources