Skip to content

Web Services & APIs — Operations

Practical guide to deploying, documenting, securing, versioning, testing, and monitoring web service APIs.


API Specification Formats

Specifications are machine-readable contracts for APIs — enabling codegen, mock servers, linting, and documentation.

OpenAPI 3.1 (REST)

The industry standard for describing RESTful HTTP APIs. Version 3.1 aligns with JSON Schema draft 2020-12.

openapi: 3.1.0
info:
  title: Orders API
  version: 2.4.0
  contact:
    email: [email protected]
  license:
    name: Apache 2.0
servers:
  - url: https://api.example.com/v2
    description: Production
  - url: https://sandbox.api.example.com/v2
    description: Sandbox

paths:
  /orders/{orderId}:
    get:
      operationId: getOrder
      summary: Retrieve a single order
      tags: [Orders]
      parameters:
        - name: orderId
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Order found
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Order"
        "404":
          $ref: "#/components/responses/NotFound"
      security:
        - bearerAuth: []

components:
  schemas:
    Order:
      type: object
      required: [id, status, createdAt]
      properties:
        id:
          type: string
          format: uuid
        status:
          type: string
          enum: [pending, confirmed, shipped, delivered, cancelled]
        createdAt:
          type: string
          format: date-time
  responses:
    NotFound:
      description: Resource not found
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/ProblemDetail"
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

Key OpenAPI 3.1 improvements over 3.0: - Full JSON Schema 2020-12 alignment (replaces OpenAPI's extended subset) - webhooks top-level field for inbound webhooks - discriminator improvements, const, $schema per-schema - exclusiveMinimum/exclusiveMaximum now numeric (not boolean)

AsyncAPI 3.0 (Event-Driven APIs)

OpenAPI equivalent for WebSocket, MQTT, Kafka, AMQP, SNS/SQS APIs.

asyncapi: 3.0.0
info:
  title: Order Events API
  version: 1.0.0
channels:
  orderCreated:
    address: orders.created
    messages:
      OrderCreated:
        payload:
          type: object
          properties:
            orderId:
              type: string
            customerId:
              type: string
operations:
  onOrderCreated:
    action: receive
    channel:
      $ref: "#/channels/orderCreated"

Protocol Buffers IDL (gRPC)

See architecture#protocol-buffers for the full .proto format. The .proto file IS the API spec for gRPC services.

Tooling comparison:

Format Ecosystem Codegen Mock Server Linting
OpenAPI 3.1 REST Any language Prism, WireMock Spectral, Vacuum
AsyncAPI 3.0 Event-driven Node.js, Java Microcks AsyncAPI Studio
Protobuf gRPC Any language grpc-go test server buf lint
WSDL SOAP Java, .NET, Python SoapUI SOAP UI

API Gateways

An API gateway is the single entry point for all client traffic — handling routing, auth enforcement, rate limiting, observability, and protocol translation.

flowchart LR
    C1[Mobile Client] --> GW[API Gateway]
    C2[Browser] --> GW
    C3[Partner API] --> GW
    GW -->|/orders| OS[Orders Service]
    GW -->|/users| US[User Service]
    GW -->|/products| PS[Product Service]
    GW --> Auth[Auth Service]
    GW --> RL[Rate Limiter\nRedis]
    GW --> Log[Observability\nDatadog / Grafana]

Kong Gateway

Open-source gateway built on NGINX + OpenResty (Lua). Enterprise tier adds RBAC, Dev Portal, and Vitals analytics.

# Kong declarative config (deck format)
services:
  - name: orders-service
    url: http://orders-service:8080
    plugins:
      - name: rate-limiting
        config:
          minute: 1000
          policy: redis
          redis_host: redis
      - name: jwt
        config:
          claims_to_verify: [exp]
    routes:
      - name: orders-route
        paths: [/v2/orders]
        strip_path: false
        methods: [GET, POST, PUT, PATCH, DELETE]
# Kong Admin API — add plugin to route
curl -X POST http://kong:8001/routes/orders-route/plugins \
  --data name=request-transformer \
  --data "config.add.headers[]=X-Request-ID:$(uuidgen)"

Envoy Proxy

High-performance C++ proxy developed at Lyft. Operates as data plane in Istio service mesh. Configured via xDS APIs (dynamic) or static YAML.

# Envoy static config — HTTP rate limit filter
http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: orders_api
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_service
        transport_api_version: V3

AWS API Gateway

Managed gateway for REST, HTTP, and WebSocket APIs. Integrates natively with Lambda, ALB, and VPC Link.

# Create HTTP API (simpler, lower cost than REST API)
aws apigatewayv2 create-api \
  --name orders-api \
  --protocol-type HTTP \
  --target arn:aws:lambda:us-east-1:123456789:function:orders-handler

# Add JWT authorizer
aws apigatewayv2 create-authorizer \
  --api-id abc123 \
  --authorizer-type JWT \
  --identity-source '$request.header.Authorization' \
  --jwt-configuration Audience=orders-api,Issuer=https://auth.example.com \
  --name JwtAuthorizer

Gateway comparison:

Gateway Deployment Config Model Best For
Kong Self-hosted / Cloud Declarative YAML / Admin API Large teams, plugin ecosystem
Envoy Self-hosted (sidecar) xDS (dynamic) / YAML Service mesh, Kubernetes
AWS API Gateway Managed Console / CDK / SAM AWS-native serverless
Nginx Self-hosted Imperative config Simple reverse proxy
Traefik Self-hosted Auto-discover (Kubernetes) Kubernetes ingress
Azure API Management Managed Portal / ARM / Bicep Azure-native

Authentication and Authorization

API Keys

Simplest scheme. Suitable for server-to-server or developer access where OAuth overhead is unneeded.

GET /v2/orders HTTP/1.1
X-API-Key: sk_live_a1b2c3d4e5f6

Best practices: - Prefix keys by environment: sk_live_, sk_test_ - Store only the hash (SHA-256) in database — never plaintext - Rotate on compromise; provide 30-day grace period during planned rotations - Associate keys with scopes: orders:read, orders:write

JWT (JSON Web Tokens)

Stateless bearer tokens. Three base64url-encoded parts: header, payload, signature.

# Decode JWT without verification (debugging)
echo "eyJhbGci..." | cut -d. -f2 | base64 -d | jq
// Payload claims
{
  "sub": "user_01HXYZ",
  "iss": "https://auth.example.com",
  "aud": "orders-api",
  "exp": 1745600000,
  "iat": 1745596400,
  "scope": "orders:read orders:write",
  "jti": "01HXYZ-unique-token-id"
}

JWT security checklist: - Use RS256 (asymmetric) for public key distribution, not HS256 (shared secret) - Short expiry: 15 minutes for access tokens; refresh tokens via httpOnly cookies - Validate iss, aud, exp, nbf on every request - Include jti (JWT ID) for revocation lookup in Redis blocklist - Never store sensitive data in payload — JWTs are encoded, not encrypted (use JWE for confidentiality)

OAuth 2.0 / OAuth 2.1

Authorization Code + PKCE (browser and mobile clients):

sequenceDiagram
    participant U as User
    participant C as Client App
    participant AS as Auth Server
    participant RS as Resource Server

    C->>C: Generate code_verifier, code_challenge = SHA256(verifier)
    C->>AS: GET /authorize?response_type=code&client_id=...&code_challenge=...
    AS->>U: Login + Consent screen
    U->>AS: Approve
    AS->>C: Redirect with ?code=AUTH_CODE
    C->>AS: POST /token {code, code_verifier, client_id}
    AS->>C: {access_token, refresh_token, expires_in}
    C->>RS: GET /orders Authorization: Bearer ACCESS_TOKEN
    RS->>C: 200 {orders: [...]}

Client Credentials (machine-to-machine):

curl -X POST https://auth.example.com/oauth/token \
  -d grant_type=client_credentials \
  -d client_id=service-account \
  -d client_secret=secret \
  -d scope="orders:read inventory:write"

OAuth 2.1 key changes (draft consolidation): - PKCE mandatory for all public clients - Implicit flow removed - Resource Owner Password Credentials (ROPC) flow removed - Refresh token rotation required for public clients

mTLS (Mutual TLS)

Both client and server present certificates — eliminates shared secrets for service-to-service auth.

# Generate client cert signed by your CA
openssl req -new -key client.key -out client.csr \
  -subj "/CN=orders-service/O=internal"
openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key \
  -CAcreateserial -out client.crt -days 365

# Call API with client cert
curl --cert client.crt --key client.key \
  --cacert ca.crt \
  https://internal-api.example.com/v2/orders

In Kubernetes: use SPIFFE/SPIRE for automatic workload identity, or let Istio inject mTLS transparently via sidecar.


API Versioning

Versioning Strategies

Strategy Example Pros Cons
URI path /v2/orders Most visible, easy routing Breaks resource identity
Query param /orders?version=2 Non-breaking URL Easily forgotten, cache unfriendly
Header API-Version: 2024-01-01 Clean URLs Less discoverable
Content negotiation Accept: application/vnd.api+json;version=2 RFC-compliant Complex client setup

URI versioning is the most common choice for public APIs (used by Stripe, Twilio, GitHub). Header versioning (calendar-based like Stripe-Version: 2023-10-16) is used by Stripe alongside URI versioning for fine-grained migrations.

Calendar-Based Versioning (Stripe Pattern)

Instead of major version bumps, every breaking change gets a calendar date:

GET /v1/charges HTTP/1.1
Stripe-Version: 2023-10-16

Each API key locks to a version at creation. Customers opt into new versions explicitly.

Deprecation Headers (RFC 8594)

HTTP/1.1 200 OK
Deprecation: "2026-01-01T00:00:00Z"
Sunset: "2027-01-01T00:00:00Z"
Link: <https://docs.example.com/migration/v3>; rel="successor-version"
  • Deprecation: when the endpoint was deprecated
  • Sunset: when it will stop working (RFC 8594)
  • Link: migration guide

Non-Breaking vs Breaking Changes

Non-breaking (safe to ship): - Adding optional request fields - Adding new response fields - Adding new endpoints - New enum values (unless clients use exhaustive matching)

Breaking (require new version): - Removing or renaming fields - Changing field types - Changing HTTP method for an operation - Altering authentication requirements - Removing enum values


Rate Limiting

Rate limiting protects services from abuse, ensures fair usage, and enables monetization tiers.

Algorithms

Token Bucket (allow bursting):

capacity = 100 tokens
refill_rate = 10 tokens/second

on request:
  if tokens >= cost:
    tokens -= cost
    return ALLOW
  else:
    return 429 Too Many Requests

AWS API Gateway and Kong use token bucket by default.

Sliding Window Log (most precise):

Stores timestamp of each request. Counts requests within [now - window, now]. High memory cost at scale.

Sliding Window Counter (approximation, low memory):

rate = (prev_count × (1 - elapsed/window)) + curr_count

Redis-based implementation: two counters (current window, previous window) per key.

Fixed Window (simplest, boundary spike risk):

Resets counter at fixed intervals. A burst at 11:59:59 and 12:00:01 yields 2× the allowed rate.

Response Headers

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1745600000
Retry-After: 30

On 429:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1745600000
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "You have exceeded 1000 requests per minute."
}

Rate Limit Keys

Choose the right granularity:

Key Use Case
IP address Unauthenticated public APIs, DDoS protection
API key Developer tier enforcement
User ID Per-account limits after auth
Endpoint Expensive operations (e.g., /search)
Tenant ID SaaS multi-tenant isolation

CORS (Cross-Origin Resource Sharing)

CORS restricts which browser origins can call your API. It does NOT protect server-to-server calls.

# Preflight request (browser auto-sends for non-simple requests)
OPTIONS /v2/orders HTTP/1.1
Origin: https://app.example.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Authorization, Content-Type

# Server response
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST, PUT, PATCH, DELETE, OPTIONS
Access-Control-Allow-Headers: Authorization, Content-Type, X-Request-ID
Access-Control-Max-Age: 86400
Access-Control-Allow-Credentials: true

Critical rules: - Never set Access-Control-Allow-Origin: * with Access-Control-Allow-Credentials: true — browsers block it - Maintain an allowlist of trusted origins; validate dynamically against it - Cache preflight with Access-Control-Max-Age to reduce OPTIONS overhead


API Design Best Practices

Resource Naming

# Good — noun-based, plural, lowercase
GET    /v2/orders
POST   /v2/orders
GET    /v2/orders/{orderId}
PUT    /v2/orders/{orderId}
PATCH  /v2/orders/{orderId}
DELETE /v2/orders/{orderId}

# Nested resources — use sparingly; max 2 levels deep
GET /v2/orders/{orderId}/items
POST /v2/orders/{orderId}/items

# Actions (verbs) — use only for operations that don't map to CRUD
POST /v2/orders/{orderId}/cancel
POST /v2/orders/{orderId}/refund
POST /v2/payments/{paymentId}/capture

Idempotency Keys

Prevent duplicate processing when clients retry on network failure.

POST /v2/orders HTTP/1.1
Idempotency-Key: 01HXYZ-unique-request-id
Content-Type: application/json

{"productId": "prod_123", "quantity": 2}
Server logic:
1. Hash Idempotency-Key → look up in idempotency store (Redis/DB)
2. If found and result cached → return cached response immediately
3. If found and in-flight → return 409 Conflict or wait
4. If not found → process, store result keyed to hash, return result

TTL: 24–48 hours (per Stripe: 24h)

Pagination

Cursor-based (recommended for large/real-time datasets):

// Request: GET /v2/orders?limit=20&after=01HXYZ
{
  "data": [...],
  "pagination": {
    "limit": 20,
    "hasNextPage": true,
    "nextCursor": "01HABC",
    "hasPrevPage": true,
    "prevCursor": "01HWXY"
  }
}

Offset-based (simpler, avoid for real-time data — page drift on inserts):

// Request: GET /v2/orders?limit=20&offset=40
{
  "data": [...],
  "pagination": {
    "total": 1847,
    "limit": 20,
    "offset": 40,
    "pages": 93
  }
}

Standardized Error Responses (RFC 9457 / Problem Details)

{
  "type": "https://api.example.com/errors/validation-error",
  "title": "Validation Error",
  "status": 422,
  "detail": "Request body contains invalid fields.",
  "instance": "/v2/orders/01HXYZ",
  "errors": [
    {
      "field": "quantity",
      "message": "Must be a positive integer",
      "code": "INVALID_VALUE"
    },
    {
      "field": "productId",
      "message": "Product not found",
      "code": "RESOURCE_NOT_FOUND"
    }
  ],
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
}

Always include traceId or requestId for support/debugging correlation.

Long-Running Operations (202 Async Pattern)

# 1. Client submits job
POST /v2/reports HTTP/1.1
{"type": "monthly-revenue", "month": "2026-03"}

# 2. Server accepts immediately
HTTP/1.1 202 Accepted
Location: /v2/reports/jobs/job_01HXYZ
Retry-After: 30

# 3. Client polls
GET /v2/reports/jobs/job_01HXYZ

# 4a. Still processing
HTTP/1.1 200 OK
{"status": "processing", "progress": 42, "estimatedCompletion": "2026-04-25T10:15:00Z"}

# 4b. Complete
HTTP/1.1 200 OK
{"status": "complete", "resultUrl": "/v2/reports/rpt_01HABC", "expiresAt": "2026-04-26T10:00:00Z"}

# 5. Retrieve result
GET /v2/reports/rpt_01HABC

Alternative: use webhook callback instead of polling — POST /v2/reports body includes callbackUrl.

Filtering, Sorting, Searching

# Filtering — use query params
GET /v2/orders?status=pending&customerId=cust_123&createdAfter=2026-01-01

# Sorting — field and direction
GET /v2/orders?sort=-createdAt,+status   # minus = desc, plus = asc

# Sparse fieldsets — reduce payload size
GET /v2/orders?fields=id,status,total

# Full-text search
GET /v2/products?q=wireless+headphones&category=electronics

API First Design

Design the API contract before writing implementation code.

Workflow: 1. Write OpenAPI spec in YAML (use Spectral to lint against rules) 2. Generate mock server with Prism: prism mock openapi.yaml 3. Share mock URL with frontend team — both sides develop in parallel 4. Generate server stubs with oapi-codegen (Go), openapi-generator (Java/Python/etc.) 5. Write implementation against generated interfaces 6. Run contract tests against live server to verify spec compliance

# Prism mock server (read OpenAPI spec, serve mock responses)
npx @stoplight/prism-cli mock openapi.yaml --port 4010

# Call mock
curl http://localhost:4010/v2/orders/01HXYZ \
  -H "Authorization: Bearer test-token"

# Prism validation proxy (forward to real server, validate request/response against spec)
npx @stoplight/prism-cli proxy openapi.yaml http://localhost:8080

Testing

REST API Testing (curl)

# GET with auth header and pretty JSON
curl -s -X GET https://api.example.com/v2/orders/01HXYZ \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" | jq

# POST with JSON body
curl -s -X POST https://api.example.com/v2/orders \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{"productId": "prod_123", "quantity": 2}' | jq

# Test rate limiting — fire 10 requests rapidly
for i in {1..10}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -H "Authorization: Bearer $TOKEN" \
    https://api.example.com/v2/orders
done

# Inspect headers only
curl -sI https://api.example.com/v2/orders

# Follow redirects, show timing
curl -v -w "@curl-format.txt" -L https://api.example.com/v2/orders

gRPC Testing (grpcurl)

# Install
brew install grpcurl

# List services (server reflection must be enabled)
grpcurl -plaintext localhost:50051 list

# Describe a service
grpcurl -plaintext localhost:50051 describe orders.OrderService

# Unary call
grpcurl -plaintext \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"order_id": "01HXYZ"}' \
  localhost:50051 orders.OrderService/GetOrder

# Server streaming call
grpcurl -plaintext \
  -d '{"customer_id": "cust_123"}' \
  localhost:50051 orders.OrderService/WatchOrders

# Call with TLS
grpcurl \
  -cert client.crt -key client.key -cacert ca.crt \
  api.example.com:443 orders.OrderService/GetOrder \
  -d '{"order_id": "01HXYZ"}'

WebSocket Testing (wscat)

# Install
npm install -g wscat

# Connect to WebSocket server
wscat -c wss://api.example.com/ws \
  --header "Authorization: Bearer $TOKEN"

# Send a message (after connecting)
> {"type": "subscribe", "channel": "orders", "customerId": "cust_123"}
< {"type": "subscribed", "channel": "orders"}
< {"type": "order.updated", "orderId": "01HXYZ", "status": "shipped"}

# Connect with subprotocol
wscat -c wss://api.example.com/ws --subprotocol "v2.orders"

GraphQL Testing (curl + jq)

# Introspection query
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ __schema { types { name } } }"}' | jq '.data.__schema.types[].name'

# Query with variables
curl -s -X POST https://api.example.com/graphql \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "query GetOrder($id: ID!) { order(id: $id) { status total } }",
    "variables": {"id": "01HXYZ"}
  }' | jq

# Mutation
curl -s -X POST https://api.example.com/graphql \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "mutation CancelOrder($id: ID!) { cancelOrder(id: $id) { success } }",
    "variables": {"id": "01HXYZ"}
  }' | jq

Load Testing (k6)

// k6 load test script — orders API
import http from "k6/http";
import { check, sleep } from "k6";
import { Rate } from "k6/metrics";

const errorRate = new Rate("errors");

export const options = {
  stages: [
    { duration: "30s", target: 50 },   // ramp up to 50 VUs
    { duration: "2m", target: 50 },    // hold
    { duration: "30s", target: 200 },  // spike to 200 VUs
    { duration: "1m", target: 200 },   // hold spike
    { duration: "30s", target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<500"],  // 95th percentile < 500ms
    errors: ["rate<0.01"],             // error rate < 1%
  },
};

export default function () {
  const res = http.get("https://api.example.com/v2/orders", {
    headers: { Authorization: `Bearer ${__ENV.API_TOKEN}` },
  });
  const ok = check(res, {
    "status is 200": (r) => r.status === 200,
    "response time < 500ms": (r) => r.timings.duration < 500,
  });
  errorRate.add(!ok);
  sleep(1);
}
k6 run --env API_TOKEN=$TOKEN load-test.js

Contract Testing (Pact)

Consumer-driven contract tests verify that API providers honour contracts expected by consumers.

# Consumer writes expectations → generates pact file
# Provider verifies pact file against running service

# Publish to Pact Broker
npx pact-broker publish ./pacts \
  --broker-base-url https://your-pact-broker.example.com \
  --consumer-app-version $(git rev-parse HEAD)

# Provider verifies
npx pact-provider-verifier \
  --provider-base-url http://localhost:8080 \
  --pact-broker-base-url https://your-pact-broker.example.com \
  --provider orders-service

Monitoring and Observability

Key Metrics (RED Method)

Metric Description Alert Threshold (example)
Rate Requests per second Traffic drop > 50% vs baseline
Errors 5xx error rate > 1% over 5 minutes
Duration p50, p95, p99 latency p99 > 1000ms

Additional API-specific metrics: - 4xx rate (client errors) — spike may indicate breaking change or client bug - Auth failure rate — spike indicates credential attack or misconfiguration - Rate limit hit rate (429 responses) — indicate capacity planning needs - Payload size distribution — detect runaway requests

Distributed Tracing (OpenTelemetry)

# Node.js — auto-instrumentation with OTLP export
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

# Inject trace context headers
GET /v2/orders HTTP/1.1
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: rend=congo

Propagate traceparent across all service boundaries. Every response should include X-Request-ID or X-Trace-ID tied to the trace.

Structured Logging

{
  "level": "info",
  "timestamp": "2026-04-25T10:00:00.123Z",
  "service": "orders-api",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spanId": "00f067aa0ba902b7",
  "method": "GET",
  "path": "/v2/orders/01HXYZ",
  "statusCode": 200,
  "durationMs": 47,
  "customerId": "cust_123",
  "region": "us-east-1"
}

Health Endpoints

# Liveness — is the process alive?
GET /health/live
HTTP/1.1 200 OK
{"status": "ok"}

# Readiness — is the service ready to receive traffic?
GET /health/ready
HTTP/1.1 200 OK
{
  "status": "ok",
  "checks": {
    "database": "ok",
    "cache": "ok",
    "dependencyServiceA": "ok"
  }
}

# Degraded state
HTTP/1.1 503 Service Unavailable
{
  "status": "degraded",
  "checks": {
    "database": "ok",
    "cache": "error",
    "dependencyServiceA": "ok"
  }
}

Circuit Breaker Pattern

Prevents cascading failures when a downstream dependency is degraded.

States:
  CLOSED → normal operation, requests pass through
  OPEN   → dependency is failing; requests fail fast with 503
  HALF_OPEN → test probe requests sent; if success → CLOSED, if fail → OPEN

Transition triggers:
  CLOSED → OPEN:     failure rate > 50% over last 10 requests (or time window)
  OPEN → HALF_OPEN:  after cooldown period (e.g. 30 seconds)
  HALF_OPEN → CLOSED: 3 consecutive successes
  HALF_OPEN → OPEN:   1 failure

Libraries: Resilience4j (Java), polly (.NET), opossum (Node.js), gobreaker (Go).


Webhooks as a Product

For APIs that offer webhooks, treat the delivery system as a first-class product.

Delivery Architecture

sequenceDiagram
    participant ES as Event Source
    participant Q as Message Queue
    participant WD as Webhook Dispatcher
    participant C as Customer Server

    ES->>Q: Publish event
    Q->>WD: Consume event
    WD->>C: POST /webhook (signed payload)
    alt Success (2xx)
        C->>WD: 200 OK (within 5s)
        WD->>Q: Ack message
    else Failure / Timeout
        WD->>Q: Nack / retry
        WD->>WD: Exponential backoff\n(5s, 25s, 125s, ...)
        WD->>WD: After 72h: mark dead, alert
    end

Payload Signing (HMAC-SHA256)

import hashlib, hmac, time

def sign_payload(secret: str, payload: bytes) -> str:
    timestamp = str(int(time.time()))
    message = f"{timestamp}.{payload.decode()}".encode()
    signature = hmac.new(secret.encode(), message, hashlib.sha256).hexdigest()
    return f"t={timestamp},v1={signature}"

def verify_signature(secret: str, payload: bytes, header: str, tolerance: int = 300) -> bool:
    parts = dict(part.split("=", 1) for part in header.split(","))
    timestamp = int(parts["t"])
    if abs(time.time() - timestamp) > tolerance:
        return False  # replay attack
    message = f"{timestamp}.{payload.decode()}".encode()
    expected = hmac.new(secret.encode(), message, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, parts["v1"])

Reliability Patterns

Pattern Implementation
Idempotency keys Include webhookId in payload; consumer deduplicates
Immediate 200 Return 200 before processing; use queue for async work
Retry with backoff 5s → 25s → 125s → 625s; max 72h delivery window
Dead letter queue After max retries, route to DLQ; alert operator
Event ordering Include sequence counter; consumer handles out-of-order
CloudEvents format Standardize payload envelope (specversion, type, source, id)

Webhook Management Portal (product features)

  • Endpoint registration with per-event-type subscription
  • Delivery attempt log with request/response bodies (last 30 days)
  • Manual replay of failed deliveries
  • HMAC secret rotation (grace period supporting both old + new key)
  • 200 OK webhook test endpoint for validation

API Tooling Ecosystem

Category Tools
API spec editors Stoplight Studio, Swagger Editor, Redocly
Linting Spectral (OpenAPI/AsyncAPI), buf lint (Protobuf)
Mock servers Prism, WireMock, Microcks
Client testing Postman, Insomnia, Bruno, HTTPie
CLI testing curl, httpie, grpcurl, wscat, mqtt-cli
Load testing k6, Gatling, Locust, Apache JMeter
Contract testing Pact, Dredd, Schemathesis
Documentation Redoc, Swagger UI, Scalar, Mintlify
API gateways Kong, Envoy, AWS API Gateway, Traefik
Service mesh Istio, Linkerd, Consul Connect
Code generation openapi-generator, oapi-codegen, buf generate
Monitoring Datadog APM, Grafana + Prometheus, New Relic

Sources

OpenAPI & Specification

Authentication & Security

API Design

Rate Limiting & Gateways

Webhooks

Testing & Monitoring