Architecture¶

Overview¶

Cilium is an eBPF-based CNI plugin for Kubernetes that provides networking, security, and observability. Unlike traditional CNI plugins that rely on iptables, Cilium runs eBPF programs inside the Linux kernel to handle packet forwarding, policy enforcement, load balancing, and tracing. It supports both tunnel (VXLAN/Geneve) and direct routing modes, and can optionally replace kube-proxy entirely.

See also: security for identity-based policies, transparent encryption, and Tetragon.

Component Diagram¶

graph TB
    subgraph "Kubernetes Control Plane"
        KAPI["Kubernetes API Server"]
    end

    subgraph "cilium-operator (Deployment)"
        OPER["Cilium Operator<br/>IPAM, CRD management,<br/>resource garbage collection"]
    end

    subgraph "cilium-agent DaemonSet — Node 1"
        AGENT1["Cilium Agent<br/>compiles & loads eBPF,<br/>watches K8s resources"]
        BPF1["eBPF Programs<br/>TC + XDP + socket-level"]
        MAPS1["BPF Maps<br/>endpoints, services,<br/>identities, policies"]
        CNI1["CNI plugin"]
    end

    subgraph "cilium-agent DaemonSet — Node 2"
        AGENT2["Cilium Agent"]
        BPF2["eBPF Programs"]
        MAPS2["BPF Maps"]
        CNI2["CNI plugin"]
    end

    subgraph "Observability Stack"
        HUB_RELAY["Hubble Relay<br/>aggregates flows from all nodes"]
        HUB_UI["Hubble UI<br/>service map visualization"]
    end

    subgraph "Runtime Security"
        TETRA["Tetragon<br/>eBPF-based process &<br/>file observability"]
    end

    KAPI -->|watches pods, services, policies| AGENT1
    KAPI -->|watches pods, services, policies| AGENT2
    KAPI -->|CRD management| OPER
    OPER -->|IPAM allocation| AGENT1
    OPER -->|IPAM allocation| AGENT2
    AGENT1 -->|loads into kernel| BPF1
    AGENT1 -->|populates| MAPS1
    AGENT2 -->|loads into kernel| BPF2
    AGENT2 -->|populates| MAPS2
    BPF1 -->|"flow events"| HUB_RELAY
    BPF2 -->|"flow events"| HUB_RELAY
    HUB_RELAY -->|"API queries"| HUB_UI
    TETRA -->|"kernel tracepoints"| AGENT1
    TETRA -->|"kernel tracepoints"| AGENT2

Core Components¶

Cilium Agent¶

The Cilium agent (cilium-agent) runs as a DaemonSet on every node. It is the central control-plane component per node:

eBPF program management -- compiles, loads, and attaches eBPF programs to TC (traffic control), XDP, and socket hooks on network interfaces. Re-compiles and hot-swaps programs when policies or endpoints change.
Endpoint tracking -- watches the Kubernetes API for pod creation/deletion, creates Cilium endpoints, and assigns security identities.
Identity allocation -- groups pods with identical label selectors into a shared numeric security identity. This identity is embedded in packet metadata (via eBPF) so that receiving nodes can enforce policy without IP lookups.
Service load balancing -- programs BPF maps for ClusterIP, NodePort, LoadBalancer, and ExternalIP services. In kube-proxy replacement mode, this entirely replaces iptables-based service routing.
Policy enforcement -- translates CiliumNetworkPolicy and Kubernetes NetworkPolicy into eBPF rules attached to TC hooks.
Routing -- programs kernel routes for direct-routing mode, or manages VXLAN/Geneve tunnel interfaces for overlay mode.

Cilium Operator¶

The Cilium Operator runs as a Deployment (typically 1--2 replicas) and handles cluster-wide coordination:

IPAM management -- allocates CIDR blocks to nodes in cluster-pool IPAM mode. Delegates to the Kubernetes Node resource in Kubernetes IPAM mode.
CRD lifecycle -- manages CiliumIdentity, CiliumEndpoint, CiliumNode, and other CRDs.
Garbage collection -- cleans up stale resources (orphaned identities, stale endpoints).
Cross-node coordination -- handles ClusterMesh endpoint synchronization.

CNI Plugin¶

The CNI binary (cilium-cni) is invoked by the kubelet when a pod is created. It:

Calls the Cilium agent API to create an endpoint for the pod.
Configures the veth pair and moves one end into the pod's network namespace.
Triggers eBPF program recompilation if new policy or identity rules are needed.

eBPF Datapath¶

Cilium's datapath is built entirely on eBPF. Programs are attached at multiple hook points in the kernel networking stack.

graph TD
    subgraph "Packet Flow — Ingress Path"
        NIC["Network Interface<br/>(eth0)"]
        XDP["XDP Hook<br/>eBPF program<br/>(DDoS mitigation,<br/>early drop)"]
        TC_ING["TC Ingress Hook<br/>eBPF program<br/>(policy check,<br/>load balancing,<br/>routing)"]
        L3["L3 Routing<br/>(kernel FIB)"]
        VETH["Pod veth pair"]
    end

    subgraph "Packet Flow — Egress Path"
        POD_EG["Pod sends packet"]
        TC_EG["TC Egress Hook<br/>eBPF program<br/>(policy check,<br/>NAT, SNAT)"]
        SOCK_LB["Socket-level LB<br/>cmsg / BPF sock<br/>(connect-time LB,<br/>skips NAT)"]
    end

    subgraph "BPF Maps (shared state)"
        EP_MAP["Endpoint Map<br/>pod IP -> identity + MAC"]
        SVC_MAP["Service Map<br/>ClusterIP -> backend IPs"]
        POL_MAP["Policy Map<br/>identity -> allow/deny rules"]
        CT_MAP["Connection Tracking<br/>Map"]
    end

    NIC -->|"raw packet"| XDP
    XDP -->|"pass"| TC_ING
    TC_ING -->|"lookup"| EP_MAP
    TC_ING -->|"service resolve"| SVC_MAP
    TC_ING -->|"policy check"| POL_MAP
    TC_ING -->|"allowed"| L3
    L3 -->|"local delivery"| VETH

    POD_EG -->|"packet"| TC_EG
    POD_EG -->|"connect()"| SOCK_LB
    SOCK_LB -->|"direct to backend"| DEST["Destination Pod"]
    TC_EG -->|"lookup"| CT_MAP
    TC_EG -->|"policy check"| POL_MAP
    TC_EG -->|"forward"| NIC

eBPF Hook Points¶

Hook Point	Attachment	Purpose
XDP (eXpress Data Path)	Network driver, before skb allocation	Early packet drop, DDoS mitigation, Direct Server Return (DSR) for LoadBalancer XDP acceleration
TC ingress	qdisc on physical and veth interfaces	Policy enforcement, service load balancing, routing decisions
TC egress	qdisc on physical and veth interfaces	SNAT, policy enforcement, tunnel encapsulation
Socket-level BPF	cgroup attach	Connect-time load balancing (bypasses NAT entirely), socket redirect
cgroup/connect	cgroup	Intercept `connect()` syscalls to resolve ClusterIP to backend pod IP

BPF Maps¶

BPF maps are the shared data structures between the Cilium agent (userspace) and eBPF programs (kernel):

Endpoint map -- maps pod IPs to security identities, MAC addresses, and interface indices.
Service map -- stores ClusterIP/NodePort/LoadBalancer frontends and their backend pod IPs.
Policy map -- stores allow/deny rules keyed by security identity pairs.
Connection tracking map -- tracks active connections for stateful policy enforcement and NAT.
Identity map -- maps numeric identity values to label selectors.
Tunnel map -- maps remote node IPs to tunnel endpoints (in VXLAN/Geneve mode).

BPF map memory

BPF maps are pre-allocated in kernel memory. At large scale (10,000+ pods), the endpoint map and connection tracking map can consume significant memory. Tune map sizes via Helm values such as --set bpf.mapMaxSize and --set bpf.ctMapEntries.

Hubble¶

Hubble is Cilium's built-in network observability platform:

Hubble agent -- embedded in the Cilium agent. Pushes flow events (L3/L4/L7 metadata, policy verdicts) via a gRPC API on port 4244.
Hubble Relay -- a Deployment that connects to all per-node Hubble agents and aggregates flows cluster-wide. Exposes a unified gRPC API on port 4245.
Hubble UI -- web interface providing a service dependency map, flow table, and policy visualization.
Hubble CLI -- command-line tool for querying flows (hubble observe), filtering by pod, namespace, identity, verdict, or L7 protocol.

Hubble can see DNS queries, HTTP requests/responses, gRPC calls, TCP handshake details, and policy drop reasons -- all without application changes or sidecars.

Tetragon¶

Tetragon is a separate eBPF-based runtime security component that can run alongside Cilium:

Process execution monitoring -- traces execve(), fork(), and other process lifecycle events via kernel tracepoints.
File access monitoring -- observes file open, read, write, and permission change events.
Network-level observability -- tracks TCP connects, accepts, and socket-level events at the kernel level.
Tracing policies -- defines what events to monitor via TracingPolicy CRDs with eBPF-based filtering.
Real-time enforcement -- can block suspicious process executions or file accesses in real time.

Tetragon operates independently of the Cilium CNI datapath. It attaches to kernel tracepoints and kprobes rather than TC/XDP hooks.

ClusterMesh¶

ClusterMesh enables multi-cluster networking by connecting multiple Cilium-managed clusters:

Cross-cluster pod communication -- pods in different clusters can communicate using their original pod IPs.
Shared service discovery -- Kubernetes services can have backends in multiple clusters.
Global identity -- security identities are synchronized across clusters so policies apply consistently.
etcd key-value store -- each cluster exposes its Cilium etcd data to peer clusters for endpoint synchronization.

graph LR
    subgraph "Cluster A"
        CA_AGENT["Cilium Agents"]
        CA_ETCD["Cilium etcd<br/>(endpoint data)"]
    end

    subgraph "Cluster B"
        CB_AGENT["Cilium Agents"]
        CB_ETCD["Cilium etcd"]
    end

    CA_AGENT -->|"sync endpoints"| CA_ETCD
    CB_AGENT -->|"sync endpoints"| CB_ETCD
    CA_ETCD <-->|"cross-cluster<br/>read-only"| CB_AGENT
    CB_ETCD <-->|"cross-cluster<br/>read-only"| CA_AGENT

IPAM Modes¶

Mode	How it works	When to use
Kubernetes (default)	Delegates IP allocation to the Kubernetes `Node` resource `.spec.podCIDR`. Each node gets a CIDR from the cluster CIDR range.	Standard Kubernetes clusters where kube-controller-manager allocates CIDRs
Cluster Pool	Cilium Operator allocates CIDR blocks from a configured pool and assigns them to nodes via CiliumNode CRDs.	Environments where kube-controller-manager does not allocate CIDRs
ENI	Allocates AWS Elastic Network Interface IPs directly to pods. Each pod gets a VPC-native IP.	AWS EKS or self-managed AWS clusters
Azure IPAM	Allocates IPs from Azure VNet subnets directly to pods.	Azure AKS
Multi-Pool	Supports multiple IP pools with different CIDR ranges, allowing per-pod or per-namespace pool selection.	Advanced use cases requiring multiple IP ranges

kube-proxy Replacement¶

Cilium can fully replace kube-proxy by handling all Kubernetes service types via eBPF:

Service Type	kube-proxy (iptables)	Cilium (eBPF)
ClusterIP	iptables DNAT rules	TC eBPF + socket-level connect-time LB
NodePort	iptables DNAT + kube-proxy port binding	TC eBPF on host interfaces
LoadBalancer	iptables DNAT via kube-proxy	XDP acceleration + TC eBPF
ExternalIPs	iptables DNAT	TC eBPF
HostPort	portmap CNI plugin	TC eBPF

Performance advantage

Cilium's connect-time load balancing intercepts connect() at the socket level, resolving ClusterIP directly to a backend pod IP. This eliminates all NAT overhead for inter-pod traffic, avoiding conntrack table pressure entirely.

Migration path: Set kubeProxyReplacement=true in Helm values. Cilium can also run in hybrid mode (kubeProxyReplacement=strict vs disabled) to gradually take over service handling.

Routing Modes¶

Mode	Description	When to use
Tunnel (VXLAN/Geneve)	Encapsulates pod traffic in VXLAN or Geneve overlays between nodes. No underlying network dependency.	Cloud environments, any network fabric
Native Routing (direct)	Programs kernel routes for pod CIDRs. Requires L3 connectivity between node CIDRs.	Bare-metal, on-prem, cloud VPCs with custom routing
Hybrid	Native routing within the same subnet, tunneling across subnets.	Mixed environments

Sources¶

How It Works¶

eBPF data plane internals, packet flow, Hubble observability pipeline, and Tetragon security enforcement.

eBPF Data Plane¶

Traditional CNIs use iptables — a linear chain of rules that becomes slower as rules grow (O(n)). Cilium replaces this with eBPF hash maps that provide constant-time O(1) lookups in kernel space.

flowchart LR
    subgraph Traditional["iptables-based CNI"]
        PKT1["Packet"] --> R1["Rule 1"] --> R2["Rule 2"] --> R3["Rule 3"] --> RN["Rule N\n(O(n) traversal)"]
    end

    subgraph Cilium_DP["Cilium eBPF Data Plane"]
        PKT2["Packet"] --> MAP["eBPF Hash Map\n(O(1) lookup)"] --> Action["Allow / Drop / Redirect"]
    end

    style Traditional fill:#c62828,color:#fff
    style Cilium_DP fill:#2e7d32,color:#fff

Packet Flow — Pod-to-Pod (Same Node)¶

sequenceDiagram
    participant PodA as Pod A
    participant VETH_A as veth (PodA)
    participant TC_A as TC eBPF (egress)
    participant TC_B as TC eBPF (ingress)
    participant VETH_B as veth (PodB)
    participant PodB as Pod B

    PodA->>VETH_A: Send packet
    VETH_A->>TC_A: eBPF TC hook (egress)
    TC_A->>TC_A: L3/L4/L7 policy check
    TC_A->>TC_A: Conntrack lookup
    TC_A->>TC_B: Direct redirect (bpf_redirect)
    TC_B->>VETH_B: Deliver to PodB veth
    VETH_B->>PodB: Receive packet

    Note over TC_A,TC_B: Bypasses host network stack entirely

Packet Flow — Pod-to-Service (ClusterIP)¶

sequenceDiagram
    participant Pod as Client Pod
    participant eBPF as eBPF (socket/TC)
    participant CT as Conntrack Map
    participant SVC as Service Map
    participant Backend as Backend Pod

    Pod->>eBPF: connect() to ClusterIP:port
    eBPF->>SVC: Lookup Service → backend selection
    eBPF->>eBPF: DNAT to backend Pod IP
    eBPF->>CT: Create conntrack entry
    eBPF->>Backend: Forward packet directly

    Note over eBPF: Socket-level LB: resolved at connect(),<br/>not per-packet like kube-proxy/iptables

Hubble Observability Pipeline¶

flowchart TB
    subgraph Kernel["Kernel Space"]
        eBPF_H["eBPF Programs\n(TC, socket, XDP)"]
        PerfBuf["Perf Event Buffer\n(flow events)"]
    end

    subgraph Userspace["Hubble Stack"]
        HubbleAgent["Hubble Agent\n(per-node, embedded in cilium-agent)"]
        HubbleRelay["Hubble Relay\n(cluster-wide aggregation)"]
        HubbleUI["Hubble UI\n(service map, flow table)"]
    end

    subgraph Export["Export"]
        Prom["Prometheus\n(metrics)"]
        OTEL["OpenTelemetry\n(traces)"]
        SIEM["SIEM / Log\n(JSON export)"]
    end

    eBPF_H -->|"perf events"| PerfBuf
    PerfBuf --> HubbleAgent
    HubbleAgent -->|"gRPC"| HubbleRelay
    HubbleRelay --> HubbleUI
    HubbleRelay --> Prom
    HubbleRelay --> OTEL
    HubbleRelay --> SIEM

    style Kernel fill:#f9a825,color:#000
    style Userspace fill:#7b1fa2,color:#fff

Tetragon — Runtime Security¶

flowchart LR
    subgraph Kernel_T["Kernel"]
        LSM["LSM Hooks"]
        Kprobes["kprobes / tracepoints"]
        eBPF_T["Tetragon eBPF\nprograms"]
    end

    subgraph Userspace_T["Tetragon Agent"]
        PolicyEngine["Policy Engine\n(TracingPolicy CRDs)"]
        EventProc["Event Processor"]
    end

    subgraph Actions["Enforcement"]
        Log["Log event"]
        Kill["Kill process\n(SIGKILL)"]
        Alert["Send alert"]
    end

    LSM --> eBPF_T
    Kprobes --> eBPF_T
    eBPF_T --> EventProc
    PolicyEngine --> eBPF_T
    EventProc --> Log
    EventProc --> Kill
    EventProc --> Alert

    style Kernel_T fill:#c62828,color:#fff

Tetragon Use Cases¶

Use Case	TracingPolicy Target
Detect container escape	Monitor `setns`, `unshare` syscalls
Block crypto mining	Kill processes connecting to mining pools
File integrity	Alert on writes to `/etc/passwd`, `/etc/shadow`
Network forensics	Log all TCP connections from a namespace
Privilege escalation	Detect `setuid(0)` calls

eBPF Map Types Used¶

Map Type	Purpose
Hash map	Policy rules, service → endpoint mapping
LRU hash	Conntrack entries (connection state)
Array	Per-CPU counters, configuration
Ring buffer	Event export to Hubble
LPM trie	CIDR-based policy matching

Sources¶

Benchmarks¶

Scope

Performance characteristics, scaling limits, and resource consumption for Cilium.

eBPF Performance¶

Feature	Throughput	Latency	Notes
Pod-to-Pod (same node)	40+ Gbps	10-20us	XDP native
Pod-to-Pod (cross node)	9.5+ Gbps	50-100us	VXLAN/Geneve
Service load balancing	9+ Gbps	20-50us	Maglev hashing
kube-proxy replacement	+5-15% vs iptables	-20-40% vs iptables	eBPF socket-level

Scaling Limits¶

Dimension	Limit	Notes
Nodes per cluster	5,000+	Tested by Isovalent
Endpoints per node	1,000+	eBPF map capacity
Network policies	100,000+	CiliumNetworkPolicy
Identities (security)	65,535	Per-cluster identity space

Resource Consumption¶

Cluster Size	Agent CPU	Agent Memory	Operator Memory
< 100 nodes	100-200m	256Mi	128Mi
100-500 nodes	200-500m	512Mi	256Mi
500+ nodes	500m-1	1Gi+	512Mi

Sourcing Status¶

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.