Skip to content

Architecture

Related Notes

index | architecture | operations | security

Overview

Cilium is an eBPF-based CNI plugin for Kubernetes that provides networking, security, and observability. Unlike traditional CNI plugins that rely on iptables, Cilium runs eBPF programs inside the Linux kernel to handle packet forwarding, policy enforcement, load balancing, and tracing. It supports both tunnel (VXLAN/Geneve) and direct routing modes, and can optionally replace kube-proxy entirely.

See also: security for identity-based policies, transparent encryption, and Tetragon.


Component Diagram

graph TB
    subgraph "Kubernetes Control Plane"
        KAPI["Kubernetes API Server"]
    end

    subgraph "cilium-operator (Deployment)"
        OPER["Cilium Operator<br/>IPAM, CRD management,<br/>resource garbage collection"]
    end

    subgraph "cilium-agent DaemonSet — Node 1"
        AGENT1["Cilium Agent<br/>compiles & loads eBPF,<br/>watches K8s resources"]
        BPF1["eBPF Programs<br/>TC + XDP + socket-level"]
        MAPS1["BPF Maps<br/>endpoints, services,<br/>identities, policies"]
        CNI1["CNI plugin"]
    end

    subgraph "cilium-agent DaemonSet — Node 2"
        AGENT2["Cilium Agent"]
        BPF2["eBPF Programs"]
        MAPS2["BPF Maps"]
        CNI2["CNI plugin"]
    end

    subgraph "Observability Stack"
        HUB_RELAY["Hubble Relay<br/>aggregates flows from all nodes"]
        HUB_UI["Hubble UI<br/>service map visualization"]
    end

    subgraph "Runtime Security"
        TETRA["Tetragon<br/>eBPF-based process &<br/>file observability"]
    end

    KAPI -->|watches pods, services, policies| AGENT1
    KAPI -->|watches pods, services, policies| AGENT2
    KAPI -->|CRD management| OPER
    OPER -->|IPAM allocation| AGENT1
    OPER -->|IPAM allocation| AGENT2
    AGENT1 -->|loads into kernel| BPF1
    AGENT1 -->|populates| MAPS1
    AGENT2 -->|loads into kernel| BPF2
    AGENT2 -->|populates| MAPS2
    BPF1 -->|"flow events"| HUB_RELAY
    BPF2 -->|"flow events"| HUB_RELAY
    HUB_RELAY -->|"API queries"| HUB_UI
    TETRA -->|"kernel tracepoints"| AGENT1
    TETRA -->|"kernel tracepoints"| AGENT2

Core Components

Cilium Agent

The Cilium agent (cilium-agent) runs as a DaemonSet on every node. It is the central control-plane component per node:

  • eBPF program management -- compiles, loads, and attaches eBPF programs to TC (traffic control), XDP, and socket hooks on network interfaces. Re-compiles and hot-swaps programs when policies or endpoints change.
  • Endpoint tracking -- watches the Kubernetes API for pod creation/deletion, creates Cilium endpoints, and assigns security identities.
  • Identity allocation -- groups pods with identical label selectors into a shared numeric security identity. This identity is embedded in packet metadata (via eBPF) so that receiving nodes can enforce policy without IP lookups.
  • Service load balancing -- programs BPF maps for ClusterIP, NodePort, LoadBalancer, and ExternalIP services. In kube-proxy replacement mode, this entirely replaces iptables-based service routing.
  • Policy enforcement -- translates CiliumNetworkPolicy and Kubernetes NetworkPolicy into eBPF rules attached to TC hooks.
  • Routing -- programs kernel routes for direct-routing mode, or manages VXLAN/Geneve tunnel interfaces for overlay mode.

Cilium Operator

The Cilium Operator runs as a Deployment (typically 1--2 replicas) and handles cluster-wide coordination:

  • IPAM management -- allocates CIDR blocks to nodes in cluster-pool IPAM mode. Delegates to the Kubernetes Node resource in Kubernetes IPAM mode.
  • CRD lifecycle -- manages CiliumIdentity, CiliumEndpoint, CiliumNode, and other CRDs.
  • Garbage collection -- cleans up stale resources (orphaned identities, stale endpoints).
  • Cross-node coordination -- handles ClusterMesh endpoint synchronization.

CNI Plugin

The CNI binary (cilium-cni) is invoked by the kubelet when a pod is created. It:

  1. Calls the Cilium agent API to create an endpoint for the pod.
  2. Configures the veth pair and moves one end into the pod's network namespace.
  3. Triggers eBPF program recompilation if new policy or identity rules are needed.

eBPF Datapath

Cilium's datapath is built entirely on eBPF. Programs are attached at multiple hook points in the kernel networking stack.

graph TD
    subgraph "Packet Flow — Ingress Path"
        NIC["Network Interface<br/>(eth0)"]
        XDP["XDP Hook<br/>eBPF program<br/>(DDoS mitigation,<br/>early drop)"]
        TC_ING["TC Ingress Hook<br/>eBPF program<br/>(policy check,<br/>load balancing,<br/>routing)"]
        L3["L3 Routing<br/>(kernel FIB)"]
        VETH["Pod veth pair"]
    end

    subgraph "Packet Flow — Egress Path"
        POD_EG["Pod sends packet"]
        TC_EG["TC Egress Hook<br/>eBPF program<br/>(policy check,<br/>NAT, SNAT)"]
        SOCK_LB["Socket-level LB<br/>cmsg / BPF sock<br/>(connect-time LB,<br/>skips NAT)"]
    end

    subgraph "BPF Maps (shared state)"
        EP_MAP["Endpoint Map<br/>pod IP -> identity + MAC"]
        SVC_MAP["Service Map<br/>ClusterIP -> backend IPs"]
        POL_MAP["Policy Map<br/>identity -> allow/deny rules"]
        CT_MAP["Connection Tracking<br/>Map"]
    end

    NIC -->|"raw packet"| XDP
    XDP -->|"pass"| TC_ING
    TC_ING -->|"lookup"| EP_MAP
    TC_ING -->|"service resolve"| SVC_MAP
    TC_ING -->|"policy check"| POL_MAP
    TC_ING -->|"allowed"| L3
    L3 -->|"local delivery"| VETH

    POD_EG -->|"packet"| TC_EG
    POD_EG -->|"connect()"| SOCK_LB
    SOCK_LB -->|"direct to backend"| DEST["Destination Pod"]
    TC_EG -->|"lookup"| CT_MAP
    TC_EG -->|"policy check"| POL_MAP
    TC_EG -->|"forward"| NIC

eBPF Hook Points

Hook Point Attachment Purpose
XDP (eXpress Data Path) Network driver, before skb allocation Early packet drop, DDoS mitigation, Direct Server Return (DSR) for LoadBalancer XDP acceleration
TC ingress qdisc on physical and veth interfaces Policy enforcement, service load balancing, routing decisions
TC egress qdisc on physical and veth interfaces SNAT, policy enforcement, tunnel encapsulation
Socket-level BPF cgroup attach Connect-time load balancing (bypasses NAT entirely), socket redirect
cgroup/connect cgroup Intercept connect() syscalls to resolve ClusterIP to backend pod IP

BPF Maps

BPF maps are the shared data structures between the Cilium agent (userspace) and eBPF programs (kernel):

  • Endpoint map -- maps pod IPs to security identities, MAC addresses, and interface indices.
  • Service map -- stores ClusterIP/NodePort/LoadBalancer frontends and their backend pod IPs.
  • Policy map -- stores allow/deny rules keyed by security identity pairs.
  • Connection tracking map -- tracks active connections for stateful policy enforcement and NAT.
  • Identity map -- maps numeric identity values to label selectors.
  • Tunnel map -- maps remote node IPs to tunnel endpoints (in VXLAN/Geneve mode).

BPF map memory

BPF maps are pre-allocated in kernel memory. At large scale (10,000+ pods), the endpoint map and connection tracking map can consume significant memory. Tune map sizes via Helm values such as --set bpf.mapMaxSize and --set bpf.ctMapEntries.


Hubble

Hubble is Cilium's built-in network observability platform:

  • Hubble agent -- embedded in the Cilium agent. Pushes flow events (L3/L4/L7 metadata, policy verdicts) via a gRPC API on port 4244.
  • Hubble Relay -- a Deployment that connects to all per-node Hubble agents and aggregates flows cluster-wide. Exposes a unified gRPC API on port 4245.
  • Hubble UI -- web interface providing a service dependency map, flow table, and policy visualization.
  • Hubble CLI -- command-line tool for querying flows (hubble observe), filtering by pod, namespace, identity, verdict, or L7 protocol.

Hubble can see DNS queries, HTTP requests/responses, gRPC calls, TCP handshake details, and policy drop reasons -- all without application changes or sidecars.


Tetragon

Tetragon is a separate eBPF-based runtime security component that can run alongside Cilium:

  • Process execution monitoring -- traces execve(), fork(), and other process lifecycle events via kernel tracepoints.
  • File access monitoring -- observes file open, read, write, and permission change events.
  • Network-level observability -- tracks TCP connects, accepts, and socket-level events at the kernel level.
  • Tracing policies -- defines what events to monitor via TracingPolicy CRDs with eBPF-based filtering.
  • Real-time enforcement -- can block suspicious process executions or file accesses in real time.

Tetragon operates independently of the Cilium CNI datapath. It attaches to kernel tracepoints and kprobes rather than TC/XDP hooks.


ClusterMesh

ClusterMesh enables multi-cluster networking by connecting multiple Cilium-managed clusters:

  • Cross-cluster pod communication -- pods in different clusters can communicate using their original pod IPs.
  • Shared service discovery -- Kubernetes services can have backends in multiple clusters.
  • Global identity -- security identities are synchronized across clusters so policies apply consistently.
  • etcd key-value store -- each cluster exposes its Cilium etcd data to peer clusters for endpoint synchronization.
graph LR
    subgraph "Cluster A"
        CA_AGENT["Cilium Agents"]
        CA_ETCD["Cilium etcd<br/>(endpoint data)"]
    end

    subgraph "Cluster B"
        CB_AGENT["Cilium Agents"]
        CB_ETCD["Cilium etcd"]
    end

    CA_AGENT -->|"sync endpoints"| CA_ETCD
    CB_AGENT -->|"sync endpoints"| CB_ETCD
    CA_ETCD <-->|"cross-cluster<br/>read-only"| CB_AGENT
    CB_ETCD <-->|"cross-cluster<br/>read-only"| CA_AGENT

IPAM Modes

Mode How it works When to use
Kubernetes (default) Delegates IP allocation to the Kubernetes Node resource .spec.podCIDR. Each node gets a CIDR from the cluster CIDR range. Standard Kubernetes clusters where kube-controller-manager allocates CIDRs
Cluster Pool Cilium Operator allocates CIDR blocks from a configured pool and assigns them to nodes via CiliumNode CRDs. Environments where kube-controller-manager does not allocate CIDRs
ENI Allocates AWS Elastic Network Interface IPs directly to pods. Each pod gets a VPC-native IP. AWS EKS or self-managed AWS clusters
Azure IPAM Allocates IPs from Azure VNet subnets directly to pods. Azure AKS
Multi-Pool Supports multiple IP pools with different CIDR ranges, allowing per-pod or per-namespace pool selection. Advanced use cases requiring multiple IP ranges

kube-proxy Replacement

Cilium can fully replace kube-proxy by handling all Kubernetes service types via eBPF:

Service Type kube-proxy (iptables) Cilium (eBPF)
ClusterIP iptables DNAT rules TC eBPF + socket-level connect-time LB
NodePort iptables DNAT + kube-proxy port binding TC eBPF on host interfaces
LoadBalancer iptables DNAT via kube-proxy XDP acceleration + TC eBPF
ExternalIPs iptables DNAT TC eBPF
HostPort portmap CNI plugin TC eBPF

Performance advantage

Cilium's connect-time load balancing intercepts connect() at the socket level, resolving ClusterIP directly to a backend pod IP. This eliminates all NAT overhead for inter-pod traffic, avoiding conntrack table pressure entirely.

Migration path: Set kubeProxyReplacement=true in Helm values. Cilium can also run in hybrid mode (kubeProxyReplacement=strict vs disabled) to gradually take over service handling.


Routing Modes

Mode Description When to use
Tunnel (VXLAN/Geneve) Encapsulates pod traffic in VXLAN or Geneve overlays between nodes. No underlying network dependency. Cloud environments, any network fabric
Native Routing (direct) Programs kernel routes for pod CIDRs. Requires L3 connectivity between node CIDRs. Bare-metal, on-prem, cloud VPCs with custom routing
Hybrid Native routing within the same subnet, tunneling across subnets. Mixed environments

Sources


How It Works

eBPF data plane internals, packet flow, Hubble observability pipeline, and Tetragon security enforcement.

eBPF Data Plane

Traditional CNIs use iptables — a linear chain of rules that becomes slower as rules grow (O(n)). Cilium replaces this with eBPF hash maps that provide constant-time O(1) lookups in kernel space.

flowchart LR
    subgraph Traditional["iptables-based CNI"]
        PKT1["Packet"] --> R1["Rule 1"] --> R2["Rule 2"] --> R3["Rule 3"] --> RN["Rule N\n(O(n) traversal)"]
    end

    subgraph Cilium_DP["Cilium eBPF Data Plane"]
        PKT2["Packet"] --> MAP["eBPF Hash Map\n(O(1) lookup)"] --> Action["Allow / Drop / Redirect"]
    end

    style Traditional fill:#c62828,color:#fff
    style Cilium_DP fill:#2e7d32,color:#fff

Packet Flow — Pod-to-Pod (Same Node)

sequenceDiagram
    participant PodA as Pod A
    participant VETH_A as veth (PodA)
    participant TC_A as TC eBPF (egress)
    participant TC_B as TC eBPF (ingress)
    participant VETH_B as veth (PodB)
    participant PodB as Pod B

    PodA->>VETH_A: Send packet
    VETH_A->>TC_A: eBPF TC hook (egress)
    TC_A->>TC_A: L3/L4/L7 policy check
    TC_A->>TC_A: Conntrack lookup
    TC_A->>TC_B: Direct redirect (bpf_redirect)
    TC_B->>VETH_B: Deliver to PodB veth
    VETH_B->>PodB: Receive packet

    Note over TC_A,TC_B: Bypasses host network stack entirely

Packet Flow — Pod-to-Service (ClusterIP)

sequenceDiagram
    participant Pod as Client Pod
    participant eBPF as eBPF (socket/TC)
    participant CT as Conntrack Map
    participant SVC as Service Map
    participant Backend as Backend Pod

    Pod->>eBPF: connect() to ClusterIP:port
    eBPF->>SVC: Lookup Service → backend selection
    eBPF->>eBPF: DNAT to backend Pod IP
    eBPF->>CT: Create conntrack entry
    eBPF->>Backend: Forward packet directly

    Note over eBPF: Socket-level LB: resolved at connect(),<br/>not per-packet like kube-proxy/iptables

Hubble Observability Pipeline

flowchart TB
    subgraph Kernel["Kernel Space"]
        eBPF_H["eBPF Programs\n(TC, socket, XDP)"]
        PerfBuf["Perf Event Buffer\n(flow events)"]
    end

    subgraph Userspace["Hubble Stack"]
        HubbleAgent["Hubble Agent\n(per-node, embedded in cilium-agent)"]
        HubbleRelay["Hubble Relay\n(cluster-wide aggregation)"]
        HubbleUI["Hubble UI\n(service map, flow table)"]
    end

    subgraph Export["Export"]
        Prom["Prometheus\n(metrics)"]
        OTEL["OpenTelemetry\n(traces)"]
        SIEM["SIEM / Log\n(JSON export)"]
    end

    eBPF_H -->|"perf events"| PerfBuf
    PerfBuf --> HubbleAgent
    HubbleAgent -->|"gRPC"| HubbleRelay
    HubbleRelay --> HubbleUI
    HubbleRelay --> Prom
    HubbleRelay --> OTEL
    HubbleRelay --> SIEM

    style Kernel fill:#f9a825,color:#000
    style Userspace fill:#7b1fa2,color:#fff

Tetragon — Runtime Security

flowchart LR
    subgraph Kernel_T["Kernel"]
        LSM["LSM Hooks"]
        Kprobes["kprobes / tracepoints"]
        eBPF_T["Tetragon eBPF\nprograms"]
    end

    subgraph Userspace_T["Tetragon Agent"]
        PolicyEngine["Policy Engine\n(TracingPolicy CRDs)"]
        EventProc["Event Processor"]
    end

    subgraph Actions["Enforcement"]
        Log["Log event"]
        Kill["Kill process\n(SIGKILL)"]
        Alert["Send alert"]
    end

    LSM --> eBPF_T
    Kprobes --> eBPF_T
    eBPF_T --> EventProc
    PolicyEngine --> eBPF_T
    EventProc --> Log
    EventProc --> Kill
    EventProc --> Alert

    style Kernel_T fill:#c62828,color:#fff

Tetragon Use Cases

Use Case TracingPolicy Target
Detect container escape Monitor setns, unshare syscalls
Block crypto mining Kill processes connecting to mining pools
File integrity Alert on writes to /etc/passwd, /etc/shadow
Network forensics Log all TCP connections from a namespace
Privilege escalation Detect setuid(0) calls

eBPF Map Types Used

Map Type Purpose
Hash map Policy rules, service → endpoint mapping
LRU hash Conntrack entries (connection state)
Array Per-CPU counters, configuration
Ring buffer Event export to Hubble
LPM trie CIDR-based policy matching

Sources


Benchmarks

Scope

Performance characteristics, scaling limits, and resource consumption for Cilium.

eBPF Performance

Feature Throughput Latency Notes
Pod-to-Pod (same node) 40+ Gbps 10-20us XDP native
Pod-to-Pod (cross node) 9.5+ Gbps 50-100us VXLAN/Geneve
Service load balancing 9+ Gbps 20-50us Maglev hashing
kube-proxy replacement +5-15% vs iptables -20-40% vs iptables eBPF socket-level

Scaling Limits

Dimension Limit Notes
Nodes per cluster 5,000+ Tested by Isovalent
Endpoints per node 1,000+ eBPF map capacity
Network policies 100,000+ CiliumNetworkPolicy
Identities (security) 65,535 Per-cluster identity space

Resource Consumption

Cluster Size Agent CPU Agent Memory Operator Memory
< 100 nodes 100-200m 256Mi 128Mi
100-500 nodes 200-500m 512Mi 256Mi
500+ nodes 500m-1 1Gi+ 512Mi

Sourcing Status

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.

Sources