Architecture¶
Related Notes
index | architecture | operations | security
Overview¶
Cilium is an eBPF-based CNI plugin for Kubernetes that provides networking, security, and observability. Unlike traditional CNI plugins that rely on iptables, Cilium runs eBPF programs inside the Linux kernel to handle packet forwarding, policy enforcement, load balancing, and tracing. It supports both tunnel (VXLAN/Geneve) and direct routing modes, and can optionally replace kube-proxy entirely.
See also: security for identity-based policies, transparent encryption, and Tetragon.
Component Diagram¶
graph TB
subgraph "Kubernetes Control Plane"
KAPI["Kubernetes API Server"]
end
subgraph "cilium-operator (Deployment)"
OPER["Cilium Operator<br/>IPAM, CRD management,<br/>resource garbage collection"]
end
subgraph "cilium-agent DaemonSet — Node 1"
AGENT1["Cilium Agent<br/>compiles & loads eBPF,<br/>watches K8s resources"]
BPF1["eBPF Programs<br/>TC + XDP + socket-level"]
MAPS1["BPF Maps<br/>endpoints, services,<br/>identities, policies"]
CNI1["CNI plugin"]
end
subgraph "cilium-agent DaemonSet — Node 2"
AGENT2["Cilium Agent"]
BPF2["eBPF Programs"]
MAPS2["BPF Maps"]
CNI2["CNI plugin"]
end
subgraph "Observability Stack"
HUB_RELAY["Hubble Relay<br/>aggregates flows from all nodes"]
HUB_UI["Hubble UI<br/>service map visualization"]
end
subgraph "Runtime Security"
TETRA["Tetragon<br/>eBPF-based process &<br/>file observability"]
end
KAPI -->|watches pods, services, policies| AGENT1
KAPI -->|watches pods, services, policies| AGENT2
KAPI -->|CRD management| OPER
OPER -->|IPAM allocation| AGENT1
OPER -->|IPAM allocation| AGENT2
AGENT1 -->|loads into kernel| BPF1
AGENT1 -->|populates| MAPS1
AGENT2 -->|loads into kernel| BPF2
AGENT2 -->|populates| MAPS2
BPF1 -->|"flow events"| HUB_RELAY
BPF2 -->|"flow events"| HUB_RELAY
HUB_RELAY -->|"API queries"| HUB_UI
TETRA -->|"kernel tracepoints"| AGENT1
TETRA -->|"kernel tracepoints"| AGENT2
Core Components¶
Cilium Agent¶
The Cilium agent (cilium-agent) runs as a DaemonSet on every node. It is the central control-plane component per node:
- eBPF program management -- compiles, loads, and attaches eBPF programs to TC (traffic control), XDP, and socket hooks on network interfaces. Re-compiles and hot-swaps programs when policies or endpoints change.
- Endpoint tracking -- watches the Kubernetes API for pod creation/deletion, creates Cilium endpoints, and assigns security identities.
- Identity allocation -- groups pods with identical label selectors into a shared numeric security identity. This identity is embedded in packet metadata (via eBPF) so that receiving nodes can enforce policy without IP lookups.
- Service load balancing -- programs BPF maps for ClusterIP, NodePort, LoadBalancer, and ExternalIP services. In kube-proxy replacement mode, this entirely replaces iptables-based service routing.
- Policy enforcement -- translates CiliumNetworkPolicy and Kubernetes NetworkPolicy into eBPF rules attached to TC hooks.
- Routing -- programs kernel routes for direct-routing mode, or manages VXLAN/Geneve tunnel interfaces for overlay mode.
Cilium Operator¶
The Cilium Operator runs as a Deployment (typically 1--2 replicas) and handles cluster-wide coordination:
- IPAM management -- allocates CIDR blocks to nodes in cluster-pool IPAM mode. Delegates to the Kubernetes
Noderesource in Kubernetes IPAM mode. - CRD lifecycle -- manages CiliumIdentity, CiliumEndpoint, CiliumNode, and other CRDs.
- Garbage collection -- cleans up stale resources (orphaned identities, stale endpoints).
- Cross-node coordination -- handles ClusterMesh endpoint synchronization.
CNI Plugin¶
The CNI binary (cilium-cni) is invoked by the kubelet when a pod is created. It:
- Calls the Cilium agent API to create an endpoint for the pod.
- Configures the veth pair and moves one end into the pod's network namespace.
- Triggers eBPF program recompilation if new policy or identity rules are needed.
eBPF Datapath¶
Cilium's datapath is built entirely on eBPF. Programs are attached at multiple hook points in the kernel networking stack.
graph TD
subgraph "Packet Flow — Ingress Path"
NIC["Network Interface<br/>(eth0)"]
XDP["XDP Hook<br/>eBPF program<br/>(DDoS mitigation,<br/>early drop)"]
TC_ING["TC Ingress Hook<br/>eBPF program<br/>(policy check,<br/>load balancing,<br/>routing)"]
L3["L3 Routing<br/>(kernel FIB)"]
VETH["Pod veth pair"]
end
subgraph "Packet Flow — Egress Path"
POD_EG["Pod sends packet"]
TC_EG["TC Egress Hook<br/>eBPF program<br/>(policy check,<br/>NAT, SNAT)"]
SOCK_LB["Socket-level LB<br/>cmsg / BPF sock<br/>(connect-time LB,<br/>skips NAT)"]
end
subgraph "BPF Maps (shared state)"
EP_MAP["Endpoint Map<br/>pod IP -> identity + MAC"]
SVC_MAP["Service Map<br/>ClusterIP -> backend IPs"]
POL_MAP["Policy Map<br/>identity -> allow/deny rules"]
CT_MAP["Connection Tracking<br/>Map"]
end
NIC -->|"raw packet"| XDP
XDP -->|"pass"| TC_ING
TC_ING -->|"lookup"| EP_MAP
TC_ING -->|"service resolve"| SVC_MAP
TC_ING -->|"policy check"| POL_MAP
TC_ING -->|"allowed"| L3
L3 -->|"local delivery"| VETH
POD_EG -->|"packet"| TC_EG
POD_EG -->|"connect()"| SOCK_LB
SOCK_LB -->|"direct to backend"| DEST["Destination Pod"]
TC_EG -->|"lookup"| CT_MAP
TC_EG -->|"policy check"| POL_MAP
TC_EG -->|"forward"| NIC
eBPF Hook Points¶
| Hook Point | Attachment | Purpose |
|---|---|---|
| XDP (eXpress Data Path) | Network driver, before skb allocation | Early packet drop, DDoS mitigation, Direct Server Return (DSR) for LoadBalancer XDP acceleration |
| TC ingress | qdisc on physical and veth interfaces | Policy enforcement, service load balancing, routing decisions |
| TC egress | qdisc on physical and veth interfaces | SNAT, policy enforcement, tunnel encapsulation |
| Socket-level BPF | cgroup attach | Connect-time load balancing (bypasses NAT entirely), socket redirect |
| cgroup/connect | cgroup | Intercept connect() syscalls to resolve ClusterIP to backend pod IP |
BPF Maps¶
BPF maps are the shared data structures between the Cilium agent (userspace) and eBPF programs (kernel):
- Endpoint map -- maps pod IPs to security identities, MAC addresses, and interface indices.
- Service map -- stores ClusterIP/NodePort/LoadBalancer frontends and their backend pod IPs.
- Policy map -- stores allow/deny rules keyed by security identity pairs.
- Connection tracking map -- tracks active connections for stateful policy enforcement and NAT.
- Identity map -- maps numeric identity values to label selectors.
- Tunnel map -- maps remote node IPs to tunnel endpoints (in VXLAN/Geneve mode).
BPF map memory
BPF maps are pre-allocated in kernel memory. At large scale (10,000+ pods), the endpoint map and connection tracking map can consume significant memory. Tune map sizes via Helm values such as --set bpf.mapMaxSize and --set bpf.ctMapEntries.
Hubble¶
Hubble is Cilium's built-in network observability platform:
- Hubble agent -- embedded in the Cilium agent. Pushes flow events (L3/L4/L7 metadata, policy verdicts) via a gRPC API on port 4244.
- Hubble Relay -- a Deployment that connects to all per-node Hubble agents and aggregates flows cluster-wide. Exposes a unified gRPC API on port 4245.
- Hubble UI -- web interface providing a service dependency map, flow table, and policy visualization.
- Hubble CLI -- command-line tool for querying flows (
hubble observe), filtering by pod, namespace, identity, verdict, or L7 protocol.
Hubble can see DNS queries, HTTP requests/responses, gRPC calls, TCP handshake details, and policy drop reasons -- all without application changes or sidecars.
Tetragon¶
Tetragon is a separate eBPF-based runtime security component that can run alongside Cilium:
- Process execution monitoring -- traces
execve(),fork(), and other process lifecycle events via kernel tracepoints. - File access monitoring -- observes file open, read, write, and permission change events.
- Network-level observability -- tracks TCP connects, accepts, and socket-level events at the kernel level.
- Tracing policies -- defines what events to monitor via
TracingPolicyCRDs with eBPF-based filtering. - Real-time enforcement -- can block suspicious process executions or file accesses in real time.
Tetragon operates independently of the Cilium CNI datapath. It attaches to kernel tracepoints and kprobes rather than TC/XDP hooks.
ClusterMesh¶
ClusterMesh enables multi-cluster networking by connecting multiple Cilium-managed clusters:
- Cross-cluster pod communication -- pods in different clusters can communicate using their original pod IPs.
- Shared service discovery -- Kubernetes services can have backends in multiple clusters.
- Global identity -- security identities are synchronized across clusters so policies apply consistently.
- etcd key-value store -- each cluster exposes its Cilium etcd data to peer clusters for endpoint synchronization.
graph LR
subgraph "Cluster A"
CA_AGENT["Cilium Agents"]
CA_ETCD["Cilium etcd<br/>(endpoint data)"]
end
subgraph "Cluster B"
CB_AGENT["Cilium Agents"]
CB_ETCD["Cilium etcd"]
end
CA_AGENT -->|"sync endpoints"| CA_ETCD
CB_AGENT -->|"sync endpoints"| CB_ETCD
CA_ETCD <-->|"cross-cluster<br/>read-only"| CB_AGENT
CB_ETCD <-->|"cross-cluster<br/>read-only"| CA_AGENT
IPAM Modes¶
| Mode | How it works | When to use |
|---|---|---|
| Kubernetes (default) | Delegates IP allocation to the Kubernetes Node resource .spec.podCIDR. Each node gets a CIDR from the cluster CIDR range. |
Standard Kubernetes clusters where kube-controller-manager allocates CIDRs |
| Cluster Pool | Cilium Operator allocates CIDR blocks from a configured pool and assigns them to nodes via CiliumNode CRDs. | Environments where kube-controller-manager does not allocate CIDRs |
| ENI | Allocates AWS Elastic Network Interface IPs directly to pods. Each pod gets a VPC-native IP. | AWS EKS or self-managed AWS clusters |
| Azure IPAM | Allocates IPs from Azure VNet subnets directly to pods. | Azure AKS |
| Multi-Pool | Supports multiple IP pools with different CIDR ranges, allowing per-pod or per-namespace pool selection. | Advanced use cases requiring multiple IP ranges |
kube-proxy Replacement¶
Cilium can fully replace kube-proxy by handling all Kubernetes service types via eBPF:
| Service Type | kube-proxy (iptables) | Cilium (eBPF) |
|---|---|---|
| ClusterIP | iptables DNAT rules | TC eBPF + socket-level connect-time LB |
| NodePort | iptables DNAT + kube-proxy port binding | TC eBPF on host interfaces |
| LoadBalancer | iptables DNAT via kube-proxy | XDP acceleration + TC eBPF |
| ExternalIPs | iptables DNAT | TC eBPF |
| HostPort | portmap CNI plugin | TC eBPF |
Performance advantage
Cilium's connect-time load balancing intercepts connect() at the socket level, resolving ClusterIP directly to a backend pod IP. This eliminates all NAT overhead for inter-pod traffic, avoiding conntrack table pressure entirely.
Migration path: Set kubeProxyReplacement=true in Helm values. Cilium can also run in hybrid mode (kubeProxyReplacement=strict vs disabled) to gradually take over service handling.
Routing Modes¶
| Mode | Description | When to use |
|---|---|---|
| Tunnel (VXLAN/Geneve) | Encapsulates pod traffic in VXLAN or Geneve overlays between nodes. No underlying network dependency. | Cloud environments, any network fabric |
| Native Routing (direct) | Programs kernel routes for pod CIDRs. Requires L3 connectivity between node CIDRs. | Bare-metal, on-prem, cloud VPCs with custom routing |
| Hybrid | Native routing within the same subnet, tunneling across subnets. | Mixed environments |
Sources¶
- Cilium Architecture
- Cilium eBPF Datapath
- kube-proxy Replacement
- Hubble Observability
- IPAM Modes
- Tetragon
How It Works¶
eBPF data plane internals, packet flow, Hubble observability pipeline, and Tetragon security enforcement.
eBPF Data Plane¶
Traditional CNIs use iptables — a linear chain of rules that becomes slower as rules grow (O(n)). Cilium replaces this with eBPF hash maps that provide constant-time O(1) lookups in kernel space.
flowchart LR
subgraph Traditional["iptables-based CNI"]
PKT1["Packet"] --> R1["Rule 1"] --> R2["Rule 2"] --> R3["Rule 3"] --> RN["Rule N\n(O(n) traversal)"]
end
subgraph Cilium_DP["Cilium eBPF Data Plane"]
PKT2["Packet"] --> MAP["eBPF Hash Map\n(O(1) lookup)"] --> Action["Allow / Drop / Redirect"]
end
style Traditional fill:#c62828,color:#fff
style Cilium_DP fill:#2e7d32,color:#fff
Packet Flow — Pod-to-Pod (Same Node)¶
sequenceDiagram
participant PodA as Pod A
participant VETH_A as veth (PodA)
participant TC_A as TC eBPF (egress)
participant TC_B as TC eBPF (ingress)
participant VETH_B as veth (PodB)
participant PodB as Pod B
PodA->>VETH_A: Send packet
VETH_A->>TC_A: eBPF TC hook (egress)
TC_A->>TC_A: L3/L4/L7 policy check
TC_A->>TC_A: Conntrack lookup
TC_A->>TC_B: Direct redirect (bpf_redirect)
TC_B->>VETH_B: Deliver to PodB veth
VETH_B->>PodB: Receive packet
Note over TC_A,TC_B: Bypasses host network stack entirely
Packet Flow — Pod-to-Service (ClusterIP)¶
sequenceDiagram
participant Pod as Client Pod
participant eBPF as eBPF (socket/TC)
participant CT as Conntrack Map
participant SVC as Service Map
participant Backend as Backend Pod
Pod->>eBPF: connect() to ClusterIP:port
eBPF->>SVC: Lookup Service → backend selection
eBPF->>eBPF: DNAT to backend Pod IP
eBPF->>CT: Create conntrack entry
eBPF->>Backend: Forward packet directly
Note over eBPF: Socket-level LB: resolved at connect(),<br/>not per-packet like kube-proxy/iptables
Hubble Observability Pipeline¶
flowchart TB
subgraph Kernel["Kernel Space"]
eBPF_H["eBPF Programs\n(TC, socket, XDP)"]
PerfBuf["Perf Event Buffer\n(flow events)"]
end
subgraph Userspace["Hubble Stack"]
HubbleAgent["Hubble Agent\n(per-node, embedded in cilium-agent)"]
HubbleRelay["Hubble Relay\n(cluster-wide aggregation)"]
HubbleUI["Hubble UI\n(service map, flow table)"]
end
subgraph Export["Export"]
Prom["Prometheus\n(metrics)"]
OTEL["OpenTelemetry\n(traces)"]
SIEM["SIEM / Log\n(JSON export)"]
end
eBPF_H -->|"perf events"| PerfBuf
PerfBuf --> HubbleAgent
HubbleAgent -->|"gRPC"| HubbleRelay
HubbleRelay --> HubbleUI
HubbleRelay --> Prom
HubbleRelay --> OTEL
HubbleRelay --> SIEM
style Kernel fill:#f9a825,color:#000
style Userspace fill:#7b1fa2,color:#fff
Tetragon — Runtime Security¶
flowchart LR
subgraph Kernel_T["Kernel"]
LSM["LSM Hooks"]
Kprobes["kprobes / tracepoints"]
eBPF_T["Tetragon eBPF\nprograms"]
end
subgraph Userspace_T["Tetragon Agent"]
PolicyEngine["Policy Engine\n(TracingPolicy CRDs)"]
EventProc["Event Processor"]
end
subgraph Actions["Enforcement"]
Log["Log event"]
Kill["Kill process\n(SIGKILL)"]
Alert["Send alert"]
end
LSM --> eBPF_T
Kprobes --> eBPF_T
eBPF_T --> EventProc
PolicyEngine --> eBPF_T
EventProc --> Log
EventProc --> Kill
EventProc --> Alert
style Kernel_T fill:#c62828,color:#fff
Tetragon Use Cases¶
| Use Case | TracingPolicy Target |
|---|---|
| Detect container escape | Monitor setns, unshare syscalls |
| Block crypto mining | Kill processes connecting to mining pools |
| File integrity | Alert on writes to /etc/passwd, /etc/shadow |
| Network forensics | Log all TCP connections from a namespace |
| Privilege escalation | Detect setuid(0) calls |
eBPF Map Types Used¶
| Map Type | Purpose |
|---|---|
| Hash map | Policy rules, service → endpoint mapping |
| LRU hash | Conntrack entries (connection state) |
| Array | Per-CPU counters, configuration |
| Ring buffer | Event export to Hubble |
| LPM trie | CIDR-based policy matching |
Sources¶
Benchmarks¶
Scope
Performance characteristics, scaling limits, and resource consumption for Cilium.
eBPF Performance¶
| Feature | Throughput | Latency | Notes |
|---|---|---|---|
| Pod-to-Pod (same node) | 40+ Gbps | 10-20us | XDP native |
| Pod-to-Pod (cross node) | 9.5+ Gbps | 50-100us | VXLAN/Geneve |
| Service load balancing | 9+ Gbps | 20-50us | Maglev hashing |
| kube-proxy replacement | +5-15% vs iptables | -20-40% vs iptables | eBPF socket-level |
Scaling Limits¶
| Dimension | Limit | Notes |
|---|---|---|
| Nodes per cluster | 5,000+ | Tested by Isovalent |
| Endpoints per node | 1,000+ | eBPF map capacity |
| Network policies | 100,000+ | CiliumNetworkPolicy |
| Identities (security) | 65,535 | Per-cluster identity space |
Resource Consumption¶
| Cluster Size | Agent CPU | Agent Memory | Operator Memory |
|---|---|---|---|
| < 100 nodes | 100-200m | 256Mi | 128Mi |
| 100-500 nodes | 200-500m | 512Mi | 256Mi |
| 500+ nodes | 500m-1 | 1Gi+ | 512Mi |
Sourcing Status¶
Unsourced Performance Data
The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.
Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.