Operations¶
Scope
Production container operations, image management, networking, storage, security hardening, and monitoring.
Image Management¶
Build Optimization¶
# Multi-stage build (reduce image size by 80%+)
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
FROM node:22-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/main.js"]
| Strategy | Impact | Notes |
|---|---|---|
| Multi-stage builds | 50-90% size reduction | Separate build and runtime stages |
| Alpine base images | 70% smaller than Debian | May have musl libc issues |
.dockerignore |
Faster builds | Exclude node_modules, .git, etc. |
| Layer caching | 10x faster rebuilds | Order COPY commands by change frequency |
--mount=type=cache |
Persistent caches | Package manager caches across builds |
Image Security¶
# Scan for vulnerabilities
docker scout cves myimage:latest
trivy image myimage:latest
# Sign images
cosign sign --key cosign.key myregistry.io/myimage:latest
Container Runtime¶
Resource Limits¶
# Run with resource constraints
docker run -d \
--memory=512m --memory-swap=1g \
--cpus=2.0 \
--pids-limit=100 \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
myapp:latest
Health Checks¶
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
Compose in Production¶
# docker-compose.prod.yml
services:
web:
image: myapp:${TAG}
deploy:
replicas: 3
resources:
limits:
cpus: '2.0'
memory: 512M
restart_policy:
condition: on-failure
max_attempts: 3
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
Common Issues¶
| Issue | Diagnosis | Fix |
|---|---|---|
| Container OOMKilled | docker inspect --format='{{.State.OOMKilled}}' |
Increase memory limit or fix leak |
| Disk space exhausted | docker system df |
docker system prune -a --volumes |
| DNS resolution fails | docker exec -it app nslookup host |
Check Docker DNS (127.0.0.11) |
| Slow builds | Layer cache invalidation | Reorder Dockerfile, use BuildKit |
| Port conflict | docker port <container> |
Change host port mapping |
Monitoring¶
# Real-time resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
# Container logs
docker logs --since 1h --tail 100 -f <container>
# Events stream
docker events --since '2026-04-12T00:00:00' --filter type=container
Commands & Recipes¶
Essential CLI commands, Dockerfile patterns, and operational recipes.
Container Lifecycle¶
# Run a container
docker run -d --name myapp -p 8080:80 nginx:1.27
# Run with resource limits
docker run -d --name myapp \
--cpus="2.0" --memory="512m" \
--restart=unless-stopped \
nginx:1.27
# Execute command inside running container
docker exec -it myapp /bin/sh
# View logs (follow + tail)
docker logs -f --tail 100 myapp
# View resource usage
docker stats myapp
# Stop gracefully (30s timeout) then force
docker stop -t 30 myapp
docker rm myapp
# Inspect container details (JSON)
docker inspect myapp | jq '.[0].NetworkSettings.IPAddress'
Image Management¶
# Build with BuildKit (multi-stage, cache)
DOCKER_BUILDKIT=1 docker build \
--target production \
--cache-from myregistry/myapp:cache \
-t myapp:latest .
# Multi-platform build
docker buildx build \
--platform linux/amd64,linux/arm64 \
--push \
-t myregistry/myapp:latest .
# Scan image for vulnerabilities
docker scout cves myapp:latest
# Prune unused images
docker image prune -a --filter "until=24h"
# Export and import images (air-gapped)
docker save myapp:latest | gzip > myapp.tar.gz
docker load < myapp.tar.gz
Docker Compose¶
# compose.yaml
services:
web:
image: nginx:1.27
ports:
- "8080:80"
volumes:
- ./html:/usr/share/nginx/html:ro
depends_on:
db:
condition: service_healthy
deploy:
resources:
limits:
cpus: "1.0"
memory: 256M
db:
image: postgres:17
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
secrets:
- db_password
volumes:
pgdata:
secrets:
db_password:
file: ./secrets/db_password.txt
# Lifecycle
docker compose up -d
docker compose ps
docker compose logs -f web
docker compose down -v # remove volumes too
Networking¶
# Create custom network
docker network create --driver bridge --subnet 10.0.0.0/24 mynet
# Connect container to network
docker network connect mynet myapp
# Inspect network
docker network inspect mynet | jq '.[0].Containers'
# DNS resolution (between containers on same network)
docker exec myapp ping db # resolves to container IP
Cleanup¶
# Nuclear cleanup (removes everything unused)
docker system prune -a --volumes
# Selective cleanup
docker container prune # stopped containers
docker image prune -a # unused images
docker volume prune # unused volumes
docker network prune # unused networks
# Check disk usage
docker system df -v
Dockerfile Best Practices¶
# syntax=docker/dockerfile:1
# ---- Build Stage ----
FROM golang:1.25-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
CGO_ENABLED=0 go build -ldflags="-s -w" -o /app/server .
# ---- Runtime Stage ----
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/server /server
EXPOSE 8080
USER nonroot:nonroot
ENTRYPOINT ["/server"]
Troubleshooting¶
# Debug container that won't start
docker run --rm -it --entrypoint /bin/sh myapp:latest
# Check OOM kills
docker inspect myapp | jq '.[0].State.OOMKilled'
# View filesystem changes
docker diff myapp
# Copy files out of container
docker cp myapp:/var/log/app.log ./app.log
# Monitor events
docker events --filter 'event=die' --filter 'event=oom'