Architecture¶
Docker uses a layered architecture with a client-daemon separation. The Docker CLI communicates with the Docker daemon (dockerd), which delegates container lifecycle management to containerd, which in turn calls low-level runtimes like runc. Image building is handled by BuildKit.
See also: infrastructure/docker/index, infrastructure/docker/architecture, infrastructure/docker/operations, infrastructure/docker/security
Component Overview¶
graph TD
CLI["docker CLI"] -->|REST API / Unix socket| DAEMON["dockerd"]
DAEMON -->|gRPC| CONTAINERD["containerd"]
CONTAINERD -->|OCI runtime| RUNTIME["runc / runhcs"]
DAEMON -->|build API| BUILDKIT["BuildKit"]
BUILDKIT -->|pull layers| REGISTRY["Container Registry"]
CONTAINERD -->|snapshots| STORAGE["overlay2 / containerd image store"]
DAEMON -->|manage| NETWORK["Network Drivers"]
DAEMON -->|manage| VOLUME["Volume Drivers"]
Core Components¶
dockerd (Docker Daemon)¶
The Docker daemon (dockerd) is the central management process. It listens for Docker API requests on a Unix socket (/var/run/docker.sock) or TCP port. Key responsibilities:
- Manage Docker objects: images, containers, networks, and volumes
- Communicate with other daemons to manage Docker services (Swarm)
- Delegate container lifecycle to
containerd - Handle image building via BuildKit integration
- Serve the Docker Engine API (REST)
Configuration
The daemon is configured via /etc/docker/daemon.json. Common settings include storage-driver, log-driver, registry-mirrors, and userns-remap.
containerd¶
containerd is an industry-standard container runtime embedded within Docker. It handles the low-level container lifecycle:
- Pull and push container images
- Manage image snapshots and metadata
- Create and supervise containers via OCI runtimes
- Manage task execution and process monitoring
- Expose a gRPC API for higher-level tools
Starting with Docker Engine 29.0, the containerd image store is the default storage backend on fresh installations, replacing the legacy graph drivers. This uses snapshotters instead of classic storage drivers.
runc¶
runc is the reference implementation of the OCI runtime specification. It is the lowest layer that actually creates and runs containers:
- Spawn containers from OCI bundles
- Configure namespaces (PID, network, mount, UTS, IPC, user)
- Set up cgroups for resource limits
- Execute the container process
Docker can also use alternative OCI runtimes (e.g., kata-containers for VM-based isolation, runhcs on Windows).
BuildKit¶
BuildKit is the next-generation build engine, replacing the legacy Docker builder. It provides:
- Parallel build execution of independent stages
- Better caching mechanisms (cache mounts, inline cache export)
- Multi-platform builds via
docker buildx - Support for alternative frontends (e.g., Buildkit-specific Dockerfiles)
- Secrets and SSH forwarding during builds
BuildKit
BuildKit is the default builder since Docker Engine 23.0. It is activated via the docker buildx CLI or automatically by docker build.
Image Layer System and overlay2¶
Docker images are composed of read-only layers stacked on top of each other. Each instruction in a Dockerfile creates a new layer. When a container starts, Docker adds a thin writable layer on top.
graph BT
BASE["Base Image Layer<br/>(e.g. ubuntu:22.04)"] --> L1["Layer 1: apt-get install"]
L1 --> L2["Layer 2: COPY app /app"]
L2 --> L3["Layer 3: RUN build"]
L3 --> WRITABLE["Container Writable Layer<br/>(thin, ephemeral)"]
style WRITABLE fill:#f9f,stroke:#333
style BASE fill:#bbf,stroke:#333
overlay2 Storage Driver¶
overlay2 is the preferred and default storage driver for all supported Linux distributions. It uses the OverlayFS kernel filesystem:
- Lower directories: Read-only image layers, stacked from base to top
- Upper directory: The container writable layer (thin RW layer)
- Merged directory: The unified view presented to the container
- Work directory: Used by OverlayFS for internal operations
Layers are stored under /var/lib/docker/overlay2/, with each layer in its own directory. The l subdirectory contains shortened symbolic links to layer directories for performance.
Containerd Image Store
Docker Engine 29.0+ uses the containerd image store by default on fresh installs, which replaces overlay2 graph drivers with containerd snapshotters. Upgraded installations retain overlay2 until explicitly migrated.
Layer Sharing and Efficiency¶
- Layers are content-addressed by SHA256 digest
- Identical layers are shared across images (deduplication)
- Pulling a new image only downloads missing layers
docker image historyshows the layer stack for any image
Networking¶
Docker provides several network drivers for container connectivity:
Bridge (default)¶
- Creates a private internal network on the host (
docker0bridge) - Containers get private IP addresses from an internal subnet
- Port mapping (
-p host:container) exposes services externally - User-defined bridges enable automatic DNS resolution between containers
- The default bridge does NOT support automatic DNS (containers communicate via IP only)
Host¶
- Removes network isolation between container and host
- Container shares the host network stack directly
- No port mapping needed; services bind directly to host interfaces
- Use case: high-performance networking where NAT overhead is unacceptable
Overlay¶
- Connects multiple Docker daemons across different hosts (Swarm mode)
- Enables container-to-container communication across hosts without OS-level routing
- Built on top of VXLAN or IPSec encryption
- Requires a key-value store (or Swarm raft consensus) for network state
Macvlan¶
- Assigns a real MAC address to each container
- Containers appear as physical devices on the network
- Traffic routed by MAC address, bypassing the Docker bridge
- Supports 802.1Q VLAN trunking via sub-interfaces (e.g.,
eth0.50) - Use case: legacy applications expecting direct network presence
IPvlan¶
- Similar to macvlan but shares the host MAC address
- Each container gets its own IP address
- L2 mode (same subnet) or L3 mode (routed between subnets)
- Useful when switch port security limits MAC addresses
None¶
- Disables all networking for the container
- Only the loopback interface is available
- Use case: isolated batch jobs, security-sensitive workloads
Volume Drivers and Storage¶
Docker volumes provide persistent data storage that outlives container lifecycle:
| Type | Command | Scope |
|---|---|---|
| Volume | -v myvol:/data |
Managed by Docker, stored in /var/lib/docker/volumes/ |
| Bind mount | -v /host/path:/data |
Maps directly to host filesystem path |
| tmpfs | --tmpfs /data |
In-memory only, non-persistent |
| Named pipe | (Windows) | Named pipe on Windows hosts |
Volume drivers extend Docker to store data on remote hosts, cloud providers, or other storage backends. Third-party plugins support NFS, AWS EFS, Azure File Storage, and more.
Data Safety
Data in volumes persists independently of containers. Removing a container does NOT remove its volumes. Use docker volume prune to clean unused volumes.
Inter-Component Communication¶
sequenceDiagram
participant User
participant CLI as docker CLI
participant Daemon as dockerd
participant BuildKit as BuildKit
participant containerd as containerd
participant runc as runc
participant Registry as Registry
Note over User,Registry: Image Build Flow
User->>CLI: docker build -t myapp .
CLI->>Daemon: POST /build
Daemon->>BuildKit: build request
BuildKit->>Registry: pull base image layers
BuildKit->>BuildKit: execute Dockerfile stages
BuildKit->>containerd: store image layers
containerd-->>Daemon: image stored
Daemon-->>CLI: build complete
CLI-->>User: image ID
Note over User,Registry: Container Run Flow
User->>CLI: docker run myapp
CLI->>Daemon: POST /containers/create
Daemon->>containerd: create container
containerd->>containerd: prepare snapshot (overlay2)
containerd->>runc: create OCI bundle + run
runc->>runc: setup namespaces + cgroups
runc-->>containerd: container started
containerd-->>Daemon: container running
Daemon-->>CLI: container ID
CLI-->>User: output
Key File Locations¶
| Path | Purpose |
|---|---|
/var/run/docker.sock |
Docker daemon Unix socket |
/etc/docker/daemon.json |
Daemon configuration file |
/var/lib/docker/overlay2/ |
Image and container layers (overlay2 driver) |
/var/lib/docker/volumes/ |
Named volumes |
/var/lib/docker/network/ |
Network configuration and state |
/var/lib/containerd/ |
containerd data (when using containerd image store) |
~/.docker/config.json |
User-level CLI configuration |
References¶
How It Works¶
Core mechanisms, container lifecycle, image layer system, and networking internals.
Container Lifecycle¶
stateDiagram-v2
[*] --> Created: docker create
Created --> Running: docker start
Running --> Paused: docker pause
Paused --> Running: docker unpause
Running --> Stopped: docker stop (SIGTERM → SIGKILL)
Stopped --> Running: docker start
Stopped --> Removed: docker rm
Running --> Removed: docker rm -f
Removed --> [*]
Running --> Restarting: crash / restart policy
Restarting --> Running: restart
Image Layer System¶
Docker images use a Union File System (overlay2 by default) that stacks read-only layers. Each Dockerfile instruction creates a new layer. When a container runs, a thin writable layer is added on top.
flowchart TB
subgraph Image["Image Layers (read-only)"]
L1["Layer 1: Base OS\n(ubuntu:24.04)"]
L2["Layer 2: apt-get install\n(python, pip)"]
L3["Layer 3: COPY app/\n(application code)"]
L4["Layer 4: RUN pip install\n(dependencies)"]
end
subgraph Container["Container Layer (read-write)"]
RW["Writable Layer\n(runtime state, logs, temp files)"]
end
RW --> L4 --> L3 --> L2 --> L1
style Container fill:#0db7ed,color:#fff
style Image fill:#1565c0,color:#fff
Copy-on-Write (CoW)¶
- When a container modifies a file from a lower layer, it is copied up to the writable layer
- Original layers remain unchanged → multiple containers share the same base layers
- Deleting a file in the container creates a whiteout entry, hiding the lower layer file
BuildKit Pipeline¶
sequenceDiagram
participant User as Developer
participant CLI as docker CLI
participant BK as BuildKit
participant Registry as Registry
User->>CLI: docker build .
CLI->>BK: Send Dockerfile + context
BK->>BK: Parse Dockerfile → DAG
BK->>BK: Resolve cache (local/registry)
par Parallel Layer Builds
BK->>BK: Stage 1 (base image)
BK->>BK: Stage 2 (dependencies)
BK->>BK: Stage 3 (application)
end
BK->>BK: Merge stages → final image
BK->>Registry: Push (if --push)
BK->>CLI: Return image ID
CLI->>User: Successfully built
Key BuildKit Features¶
| Feature | Detail |
|---|---|
| Parallel builds | Independent stages execute concurrently |
| Cache mounts | --mount=type=cache for package manager caches |
| Secret mounts | --mount=type=secret — never persisted in layers |
| Multi-platform | Build for arm64, amd64, etc. in single command |
| Registry cache | --cache-to type=registry for CI pipelines |
Networking Model¶
flowchart LR
subgraph Host["Docker Host"]
subgraph Bridge["docker0 bridge (172.17.0.0/16)"]
C1["Container 1\n172.17.0.2"]
C2["Container 2\n172.17.0.3"]
end
subgraph UserNet["user-network (10.0.0.0/24)"]
C3["Container 3\n10.0.0.2"]
C4["Container 4\n10.0.0.3"]
end
IPTABLES["iptables / nftables\n(NAT, port mapping)"]
end
Internet["Internet"] <-->|"port mapping\n-p 8080:80"| IPTABLES
IPTABLES <--> Bridge
IPTABLES <--> UserNet
style Bridge fill:#0db7ed,color:#fff
style UserNet fill:#2e7d32,color:#fff
Network Drivers¶
| Driver | Use Case |
|---|---|
| bridge | Default single-host networking; containers on same host communicate via veth pairs |
| host | Container shares host network namespace; no isolation, maximum performance |
| overlay | Multi-host networking via VXLAN; used by Swarm/K8s |
| macvlan | Container gets own MAC address; appears as physical device on network |
| ipvlan | Like macvlan but shares host MAC; L2 or L3 mode |
| none | No networking; container is completely isolated |
Storage Model¶
| Storage Type | Lifecycle | Use Case |
|---|---|---|
| Union FS layers | Tied to container | Ephemeral container filesystem |
| Named volumes | Independent of container | Databases, persistent state |
| Bind mounts | Host path → container | Development (live code reload) |
| tmpfs | In-memory only | Temporary sensitive data |
| Volume plugins | Driver-dependent | NFS, EBS, Ceph, etc. |
Sources¶
Benchmarks¶
Scope
Performance characteristics, scaling limits, and resource consumption for Docker.
Container Performance¶
| Metric | Docker | Native | Overhead |
|---|---|---|---|
| CPU | Near-native | Baseline | < 1% |
| Memory | Near-native | Baseline | 10-30MB per container |
| Network (bridge) | 90-95% native | Baseline | 5-10% overhead |
| Network (host) | Near-native | Baseline | < 1% |
| Disk I/O (overlay2) | 85-95% native | Baseline | 5-15% overhead |
| Disk I/O (bind mount) | Near-native | Baseline | < 1% |
Image Size Benchmarks¶
| Base Image | Size | Use Case |
|---|---|---|
| scratch | 0MB | Static Go binaries |
| alpine | 7MB | Minimal Linux |
| distroless | 15-25MB | Secure, no shell |
| debian-slim | 80MB | Compatibility needed |
| ubuntu | 77MB | Full Linux tools |
| node:22-alpine | 130MB | Node.js apps |
| python:3.12-slim | 150MB | Python apps |
Build Performance¶
| Strategy | Cold Build | Cached Build | Size Impact |
|---|---|---|---|
| Single stage | 30-120s | 5-30s | Large (500MB+) |
| Multi-stage | 60-180s | 10-30s | Small (50-150MB) |
| BuildKit cache | 60-180s | 3-10s | Same |
| Buildx bake | 30-90s | 3-10s | Parallel builds |
Compose Scaling¶
| Containers | Memory (runtime) | Startup Time | Notes |
|---|---|---|---|
| 10 | 100-500MB | 5-15s | Typical dev setup |
| 50 | 500MB-2GB | 15-45s | Medium application |
| 100 | 1-5GB | 30-120s | Large stack |
Sourcing Status¶
Unsourced Performance Data
The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.
Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.