Skip to content

Architecture

Docker uses a layered architecture with a client-daemon separation. The Docker CLI communicates with the Docker daemon (dockerd), which delegates container lifecycle management to containerd, which in turn calls low-level runtimes like runc. Image building is handled by BuildKit.

See also: infrastructure/docker/index, infrastructure/docker/architecture, infrastructure/docker/operations, infrastructure/docker/security

Component Overview

graph TD
    CLI["docker CLI"] -->|REST API / Unix socket| DAEMON["dockerd"]
    DAEMON -->|gRPC| CONTAINERD["containerd"]
    CONTAINERD -->|OCI runtime| RUNTIME["runc / runhcs"]
    DAEMON -->|build API| BUILDKIT["BuildKit"]
    BUILDKIT -->|pull layers| REGISTRY["Container Registry"]
    CONTAINERD -->|snapshots| STORAGE["overlay2 / containerd image store"]
    DAEMON -->|manage| NETWORK["Network Drivers"]
    DAEMON -->|manage| VOLUME["Volume Drivers"]

Core Components

dockerd (Docker Daemon)

The Docker daemon (dockerd) is the central management process. It listens for Docker API requests on a Unix socket (/var/run/docker.sock) or TCP port. Key responsibilities:

  • Manage Docker objects: images, containers, networks, and volumes
  • Communicate with other daemons to manage Docker services (Swarm)
  • Delegate container lifecycle to containerd
  • Handle image building via BuildKit integration
  • Serve the Docker Engine API (REST)

Configuration

The daemon is configured via /etc/docker/daemon.json. Common settings include storage-driver, log-driver, registry-mirrors, and userns-remap.

containerd

containerd is an industry-standard container runtime embedded within Docker. It handles the low-level container lifecycle:

  • Pull and push container images
  • Manage image snapshots and metadata
  • Create and supervise containers via OCI runtimes
  • Manage task execution and process monitoring
  • Expose a gRPC API for higher-level tools

Starting with Docker Engine 29.0, the containerd image store is the default storage backend on fresh installations, replacing the legacy graph drivers. This uses snapshotters instead of classic storage drivers.

runc

runc is the reference implementation of the OCI runtime specification. It is the lowest layer that actually creates and runs containers:

  • Spawn containers from OCI bundles
  • Configure namespaces (PID, network, mount, UTS, IPC, user)
  • Set up cgroups for resource limits
  • Execute the container process

Docker can also use alternative OCI runtimes (e.g., kata-containers for VM-based isolation, runhcs on Windows).

BuildKit

BuildKit is the next-generation build engine, replacing the legacy Docker builder. It provides:

  • Parallel build execution of independent stages
  • Better caching mechanisms (cache mounts, inline cache export)
  • Multi-platform builds via docker buildx
  • Support for alternative frontends (e.g., Buildkit-specific Dockerfiles)
  • Secrets and SSH forwarding during builds

BuildKit

BuildKit is the default builder since Docker Engine 23.0. It is activated via the docker buildx CLI or automatically by docker build.

Image Layer System and overlay2

Docker images are composed of read-only layers stacked on top of each other. Each instruction in a Dockerfile creates a new layer. When a container starts, Docker adds a thin writable layer on top.

graph BT
    BASE["Base Image Layer<br/>(e.g. ubuntu:22.04)"] --> L1["Layer 1: apt-get install"]
    L1 --> L2["Layer 2: COPY app /app"]
    L2 --> L3["Layer 3: RUN build"]
    L3 --> WRITABLE["Container Writable Layer<br/>(thin, ephemeral)"]
    style WRITABLE fill:#f9f,stroke:#333
    style BASE fill:#bbf,stroke:#333

overlay2 Storage Driver

overlay2 is the preferred and default storage driver for all supported Linux distributions. It uses the OverlayFS kernel filesystem:

  • Lower directories: Read-only image layers, stacked from base to top
  • Upper directory: The container writable layer (thin RW layer)
  • Merged directory: The unified view presented to the container
  • Work directory: Used by OverlayFS for internal operations

Layers are stored under /var/lib/docker/overlay2/, with each layer in its own directory. The l subdirectory contains shortened symbolic links to layer directories for performance.

Containerd Image Store

Docker Engine 29.0+ uses the containerd image store by default on fresh installs, which replaces overlay2 graph drivers with containerd snapshotters. Upgraded installations retain overlay2 until explicitly migrated.

Layer Sharing and Efficiency

  • Layers are content-addressed by SHA256 digest
  • Identical layers are shared across images (deduplication)
  • Pulling a new image only downloads missing layers
  • docker image history shows the layer stack for any image

Networking

Docker provides several network drivers for container connectivity:

Bridge (default)

  • Creates a private internal network on the host (docker0 bridge)
  • Containers get private IP addresses from an internal subnet
  • Port mapping (-p host:container) exposes services externally
  • User-defined bridges enable automatic DNS resolution between containers
  • The default bridge does NOT support automatic DNS (containers communicate via IP only)

Host

  • Removes network isolation between container and host
  • Container shares the host network stack directly
  • No port mapping needed; services bind directly to host interfaces
  • Use case: high-performance networking where NAT overhead is unacceptable

Overlay

  • Connects multiple Docker daemons across different hosts (Swarm mode)
  • Enables container-to-container communication across hosts without OS-level routing
  • Built on top of VXLAN or IPSec encryption
  • Requires a key-value store (or Swarm raft consensus) for network state

Macvlan

  • Assigns a real MAC address to each container
  • Containers appear as physical devices on the network
  • Traffic routed by MAC address, bypassing the Docker bridge
  • Supports 802.1Q VLAN trunking via sub-interfaces (e.g., eth0.50)
  • Use case: legacy applications expecting direct network presence

IPvlan

  • Similar to macvlan but shares the host MAC address
  • Each container gets its own IP address
  • L2 mode (same subnet) or L3 mode (routed between subnets)
  • Useful when switch port security limits MAC addresses

None

  • Disables all networking for the container
  • Only the loopback interface is available
  • Use case: isolated batch jobs, security-sensitive workloads

Volume Drivers and Storage

Docker volumes provide persistent data storage that outlives container lifecycle:

Type Command Scope
Volume -v myvol:/data Managed by Docker, stored in /var/lib/docker/volumes/
Bind mount -v /host/path:/data Maps directly to host filesystem path
tmpfs --tmpfs /data In-memory only, non-persistent
Named pipe (Windows) Named pipe on Windows hosts

Volume drivers extend Docker to store data on remote hosts, cloud providers, or other storage backends. Third-party plugins support NFS, AWS EFS, Azure File Storage, and more.

Data Safety

Data in volumes persists independently of containers. Removing a container does NOT remove its volumes. Use docker volume prune to clean unused volumes.

Inter-Component Communication

sequenceDiagram
    participant User
    participant CLI as docker CLI
    participant Daemon as dockerd
    participant BuildKit as BuildKit
    participant containerd as containerd
    participant runc as runc
    participant Registry as Registry

    Note over User,Registry: Image Build Flow
    User->>CLI: docker build -t myapp .
    CLI->>Daemon: POST /build
    Daemon->>BuildKit: build request
    BuildKit->>Registry: pull base image layers
    BuildKit->>BuildKit: execute Dockerfile stages
    BuildKit->>containerd: store image layers
    containerd-->>Daemon: image stored
    Daemon-->>CLI: build complete
    CLI-->>User: image ID

    Note over User,Registry: Container Run Flow
    User->>CLI: docker run myapp
    CLI->>Daemon: POST /containers/create
    Daemon->>containerd: create container
    containerd->>containerd: prepare snapshot (overlay2)
    containerd->>runc: create OCI bundle + run
    runc->>runc: setup namespaces + cgroups
    runc-->>containerd: container started
    containerd-->>Daemon: container running
    Daemon-->>CLI: container ID
    CLI-->>User: output

Key File Locations

Path Purpose
/var/run/docker.sock Docker daemon Unix socket
/etc/docker/daemon.json Daemon configuration file
/var/lib/docker/overlay2/ Image and container layers (overlay2 driver)
/var/lib/docker/volumes/ Named volumes
/var/lib/docker/network/ Network configuration and state
/var/lib/containerd/ containerd data (when using containerd image store)
~/.docker/config.json User-level CLI configuration

References


How It Works

Core mechanisms, container lifecycle, image layer system, and networking internals.

Container Lifecycle

stateDiagram-v2
    [*] --> Created: docker create
    Created --> Running: docker start
    Running --> Paused: docker pause
    Paused --> Running: docker unpause
    Running --> Stopped: docker stop (SIGTERM → SIGKILL)
    Stopped --> Running: docker start
    Stopped --> Removed: docker rm
    Running --> Removed: docker rm -f
    Removed --> [*]
    Running --> Restarting: crash / restart policy
    Restarting --> Running: restart

Image Layer System

Docker images use a Union File System (overlay2 by default) that stacks read-only layers. Each Dockerfile instruction creates a new layer. When a container runs, a thin writable layer is added on top.

flowchart TB
    subgraph Image["Image Layers (read-only)"]
        L1["Layer 1: Base OS\n(ubuntu:24.04)"]
        L2["Layer 2: apt-get install\n(python, pip)"]
        L3["Layer 3: COPY app/\n(application code)"]
        L4["Layer 4: RUN pip install\n(dependencies)"]
    end

    subgraph Container["Container Layer (read-write)"]
        RW["Writable Layer\n(runtime state, logs, temp files)"]
    end

    RW --> L4 --> L3 --> L2 --> L1

    style Container fill:#0db7ed,color:#fff
    style Image fill:#1565c0,color:#fff

Copy-on-Write (CoW)

  • When a container modifies a file from a lower layer, it is copied up to the writable layer
  • Original layers remain unchanged → multiple containers share the same base layers
  • Deleting a file in the container creates a whiteout entry, hiding the lower layer file

BuildKit Pipeline

sequenceDiagram
    participant User as Developer
    participant CLI as docker CLI
    participant BK as BuildKit
    participant Registry as Registry

    User->>CLI: docker build .
    CLI->>BK: Send Dockerfile + context
    BK->>BK: Parse Dockerfile → DAG
    BK->>BK: Resolve cache (local/registry)
    par Parallel Layer Builds
        BK->>BK: Stage 1 (base image)
        BK->>BK: Stage 2 (dependencies)
        BK->>BK: Stage 3 (application)
    end
    BK->>BK: Merge stages → final image
    BK->>Registry: Push (if --push)
    BK->>CLI: Return image ID
    CLI->>User: Successfully built

Key BuildKit Features

Feature Detail
Parallel builds Independent stages execute concurrently
Cache mounts --mount=type=cache for package manager caches
Secret mounts --mount=type=secret — never persisted in layers
Multi-platform Build for arm64, amd64, etc. in single command
Registry cache --cache-to type=registry for CI pipelines

Networking Model

flowchart LR
    subgraph Host["Docker Host"]
        subgraph Bridge["docker0 bridge (172.17.0.0/16)"]
            C1["Container 1\n172.17.0.2"]
            C2["Container 2\n172.17.0.3"]
        end
        subgraph UserNet["user-network (10.0.0.0/24)"]
            C3["Container 3\n10.0.0.2"]
            C4["Container 4\n10.0.0.3"]
        end
        IPTABLES["iptables / nftables\n(NAT, port mapping)"]
    end

    Internet["Internet"] <-->|"port mapping\n-p 8080:80"| IPTABLES
    IPTABLES <--> Bridge
    IPTABLES <--> UserNet

    style Bridge fill:#0db7ed,color:#fff
    style UserNet fill:#2e7d32,color:#fff

Network Drivers

Driver Use Case
bridge Default single-host networking; containers on same host communicate via veth pairs
host Container shares host network namespace; no isolation, maximum performance
overlay Multi-host networking via VXLAN; used by Swarm/K8s
macvlan Container gets own MAC address; appears as physical device on network
ipvlan Like macvlan but shares host MAC; L2 or L3 mode
none No networking; container is completely isolated

Storage Model

Storage Type Lifecycle Use Case
Union FS layers Tied to container Ephemeral container filesystem
Named volumes Independent of container Databases, persistent state
Bind mounts Host path → container Development (live code reload)
tmpfs In-memory only Temporary sensitive data
Volume plugins Driver-dependent NFS, EBS, Ceph, etc.

Sources


Benchmarks

Scope

Performance characteristics, scaling limits, and resource consumption for Docker.

Container Performance

Metric Docker Native Overhead
CPU Near-native Baseline < 1%
Memory Near-native Baseline 10-30MB per container
Network (bridge) 90-95% native Baseline 5-10% overhead
Network (host) Near-native Baseline < 1%
Disk I/O (overlay2) 85-95% native Baseline 5-15% overhead
Disk I/O (bind mount) Near-native Baseline < 1%

Image Size Benchmarks

Base Image Size Use Case
scratch 0MB Static Go binaries
alpine 7MB Minimal Linux
distroless 15-25MB Secure, no shell
debian-slim 80MB Compatibility needed
ubuntu 77MB Full Linux tools
node:22-alpine 130MB Node.js apps
python:3.12-slim 150MB Python apps

Build Performance

Strategy Cold Build Cached Build Size Impact
Single stage 30-120s 5-30s Large (500MB+)
Multi-stage 60-180s 10-30s Small (50-150MB)
BuildKit cache 60-180s 3-10s Same
Buildx bake 30-90s 3-10s Parallel builds

Compose Scaling

Containers Memory (runtime) Startup Time Notes
10 100-500MB 5-15s Typical dev setup
50 500MB-2GB 15-45s Medium application
100 1-5GB 30-120s Large stack

Sourcing Status

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.

Sources