Web Services Architecture¶

Deep dive into every major API paradigm — how each protocol works under the hood, when to use each, and how they compare.

Protocol Comparison Overview¶

graph TD
    A[Client needs data] --> B{Use case?}
    B -->|Public API, CRUD, browser-native| C[REST]
    B -->|Flexible queries, complex frontends| D[GraphQL]
    B -->|Internal service-to-service, streaming| E[gRPC]
    B -->|Real-time bidirectional| F[WebSocket]
    B -->|Server pushes only, notifications| G[SSE]
    B -->|TypeScript full-stack only| H[tRPC]
    B -->|Event notification to external systems| I[Webhooks]
    B -->|Legacy enterprise integration| J[SOAP]

Protocol	Transport	Format	Direction	Browser Native	Best For
REST	HTTP/1.1, HTTP/2	JSON (typically)	Req/Res	✅	Public APIs, CRUD, resource modeling
GraphQL	HTTP/1.1, HTTP/2	JSON	Req/Res + Subscription	✅	Complex frontends, data aggregation
gRPC	HTTP/2 only	Protocol Buffers (binary)	Req/Res + Streaming	⚠️ (needs proxy)	Internal microservices, high-throughput
SOAP	HTTP, SMTP, TCP	XML	Req/Res	✅	Legacy enterprise, financial services
WebSocket	WS (TCP upgrade)	Any (text/binary)	Full-duplex	✅	Real-time chat, gaming, collaboration
SSE	HTTP/1.1, HTTP/2	Text (UTF-8)	Server → Client only	✅	Feeds, notifications, AI streaming
Webhooks	HTTP POST	JSON (typically)	Server → Client push	✅	Event-driven integrations, automation
tRPC	HTTP/WebSocket	JSON	Req/Res + Subscription	✅ (Node/TS only)	TypeScript full-stack monorepos

REST (Representational State Transfer)¶

Roy Fielding defined REST in his 2000 doctoral dissertation as an architectural style — not a protocol — built on six constraints that, when applied together, produce a scalable, stateless, and cacheable web service.

The Six Architectural Constraints¶

1. Client–Server Separation¶

The client and server evolve independently. The server manages data storage and business logic; the client manages the user interface and user state. Neither depends on the other's implementation details — only the shared API contract.

This decoupling allows frontend teams to swap frameworks (React → Vue) or mobile clients to evolve, without requiring backend changes, and vice versa.

2. Stateless¶

Every request from client to server must contain all information necessary to understand and process the request. The server stores no session state between requests.

❌ Stateful (server stores session):
POST /login       → server creates session, returns cookie
GET /dashboard    → server reads session to identify user

✅ Stateless (client carries state):
GET /dashboard
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...

Consequences: - Scalability: any server instance can handle any request — no sticky sessions - Reliability: no session state to lose if a server crashes - Overhead: every request must carry auth credentials and context (larger payloads)

3. Cacheable¶

Responses must declare whether they are cacheable or not. When responses are cacheable, clients and intermediaries (CDNs, proxies) can serve them without contacting the server.

Key HTTP cache headers: | Header | Purpose | Example | |---|---|---| | Cache-Control | Directives for caching behavior | Cache-Control: max-age=3600, public | | ETag | Fingerprint of resource version | ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249" | | Last-Modified | When resource last changed | Last-Modified: Tue, 22 Apr 2026 12:00:00 GMT | | Vary | Which headers affect the cache key | Vary: Accept-Encoding, Authorization |

Conditional requests let clients validate their cache:

GET /users/42
If-None-Match: "d8e8fca2dc0f896fd7cb4cb0031ba249"

→ 304 Not Modified (body omitted — client uses cached copy)
→ 200 OK + new ETag + new body (cache miss — resource changed)

4. Uniform Interface¶

The single most important constraint. It defines four sub-principles:

4a. Resource Identification in Requests — every resource has a stable URI:

/users                        → collection of users
/users/42                     → specific user
/users/42/orders              → orders belonging to user 42
/users/42/orders/7/items      → items in that order

4b. Manipulation via Representations — clients hold representations (JSON, XML, HTML), not live objects. The client modifies the representation and sends it back.

4c. Self-Descriptive Messages — each request/response carries enough metadata to describe how to process it: Content-Type, method, status code, cache directives.

4d. HATEOAS — see section below.

5. Layered System¶

Clients cannot tell whether they're connected directly to the server or an intermediary (load balancer, CDN, API gateway, caching proxy). Each layer only knows about the adjacent layer.

This enables transparent insertion of: - CDNs for caching at the edge - API gateways for auth, rate limiting, routing - Load balancers for distributing traffic - Service meshes for observability and mTLS

6. Code on Demand (optional)¶

The only optional constraint. Servers can temporarily extend client functionality by transferring executable code (e.g., JavaScript). Rarely relevant in modern API design.

HTTP Methods and Idempotency¶

Method	Semantics	Idempotent	Safe	Common Use
GET	Retrieve resource(s)	✅	✅	Read data
HEAD	GET without body (check existence/metadata)	✅	✅	Cache validation
POST	Create a new resource; non-idempotent actions	❌	❌	Create, submit form, trigger action
PUT	Replace entire resource (upsert)	✅	❌	Full update
PATCH	Partial update	❌*	❌	Partial update
DELETE	Remove resource	✅	❌	Delete
OPTIONS	Discover allowed methods (used for CORS preflight)	✅	✅	CORS

* PATCH can be designed idempotently but is not required to be.

Safe = no side effects (read-only). Idempotent = making the same request N times has the same effect as making it once.

HTTP Status Codes¶

Range	Category	Key Codes
2xx	Success	200 OK, 201 Created, 202 Accepted, 204 No Content
3xx	Redirection	301 Moved Permanently, 304 Not Modified
4xx	Client Error	400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable Entity, 429 Too Many Requests
5xx	Server Error	500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout

Common Status Code Mistakes

Never return 200 OK with an error in the body — clients must parse every body to detect errors
Use 401 for unauthenticated, 403 for authenticated but unauthorized
Use 422 (not 400) when the request is syntactically valid but semantically wrong (e.g. invalid field value)
404 means "resource not found", not "I don't know" — don't use it as a catch-all

PATCH Semantics: JSON Patch vs JSON Merge Patch¶

PATCH is the most nuanced HTTP method. The two dominant formats behave very differently:

JSON Merge Patch (RFC 7396) — simple, intuitive; send only the fields you want to change:

PATCH /users/42 HTTP/1.1
Content-Type: application/merge-patch+json

{"email": "[email protected]", "phone": null}

Server merges the patch with the existing resource: email is updated, phone is removed (explicit null), all other fields are unchanged.

Limitation: you cannot set a field to null and leave it present — null always means "remove." This makes JSON Merge Patch unusable for APIs where null is a meaningful value.

JSON Patch (RFC 6902) — explicit operations array, more powerful but more complex:

PATCH /users/42 HTTP/1.1
Content-Type: application/json-patch+json

[
  { "op": "replace", "path": "/email", "value": "[email protected]" },
  { "op": "remove", "path": "/phone" },
  { "op": "add", "path": "/addresses/1", "value": {"city": "Berlin"} },
  { "op": "test", "path": "/version", "value": 3 }
]

Operations: add, remove, replace, move, copy, test. The test operation enables optimistic concurrency — the patch fails atomically if the tested value doesn't match.

Dimension	JSON Merge Patch	JSON Patch
RFC	7396	6902
Content-Type	`application/merge-patch+json`	`application/json-patch+json`
Format	Partial JSON object	Array of operations
Set field to null	❌ (null = remove)	✅ `{"op": "replace", "path": "/x", "value": null}`
Array operations	Replace entire array only	Add/remove individual elements
Atomicity	No built-in check	`test` operation for optimistic locking
Complexity	Low — just send partial object	Higher — must construct operation array
Adoption	More common (GitHub, Stripe)	Less common; used when precision needed

Practical Recommendation

Most APIs use JSON Merge Patch for simplicity. Use JSON Patch only when you need array element manipulation, optimistic concurrency via test, or the ability to distinguish "set to null" from "remove."

HATEOAS¶

Hypermedia as the Engine of Application State — the highest constraint of REST. Responses include hyperlinks that describe what actions are available next. Clients need no prior knowledge of URL structure; they navigate by following links.

{
  "id": 42,
  "name": "Alice",
  "email": "[email protected]",
  "_links": {
    "self":   { "href": "/users/42", "method": "GET" },
    "orders": { "href": "/users/42/orders", "method": "GET" },
    "update": { "href": "/users/42", "method": "PUT" },
    "delete": { "href": "/users/42", "method": "DELETE" }
  }
}

Benefits: API is self-documenting; server can change URL structure without breaking clients; workflow steps are discoverable.

In practice: very few production APIs implement full HATEOAS. Most APIs reach Level 2 of the Richardson Maturity Model (proper HTTP verbs) and stop there.

Richardson Maturity Model¶

A framework for measuring how RESTful an API actually is:

Level	Name	What It Adds	Example
0	Swamp of POX	Single endpoint, single method	`POST /api` with XML body specifying action
1	Resources	Multiple URIs, but still single HTTP verb	`POST /users`, `POST /users/42`
2	HTTP Verbs	Uses GET/POST/PUT/DELETE meaningfully	`GET /users/42`, `DELETE /users/42`
3	Hypermedia	Responses contain links for navigation (HATEOAS)	JSON with `_links` section

Roy Fielding stated that Level 3 is the pre-condition of REST. Most production APIs sit at Level 2 — which is fine for practical purposes, even if technically not "truly RESTful."

GraphQL¶

Facebook created GraphQL in 2012 and open-sourced it in 2015. It is a query language for your API and a runtime for executing those queries — giving clients the power to ask for exactly what they need and nothing more.

Core Concept: Single Endpoint¶

Unlike REST's resource-per-endpoint model, GraphQL exposes a single endpoint (typically POST /graphql) that accepts queries describing the exact shape of data needed.

# REST requires 3 round trips:
# GET /users/42
# GET /users/42/posts
# GET /posts/7/comments

# GraphQL fetches all in one request:
query {
  user(id: 42) {
    name
    email
    posts(limit: 5) {
      title
      publishedAt
      comments(limit: 3) {
        body
        author { name }
      }
    }
  }
}

Type System and Schema¶

Everything in GraphQL is strongly typed. The schema is the single source of truth — it describes every piece of data the API can return and every operation clients can perform.

Scalar Types¶

Built-in primitives: Int, Float, String, Boolean, ID. Custom scalars can be defined (e.g., DateTime, URL, JSON).

Object Types¶

type User {
  id: ID!                  # ! = non-nullable
  name: String!
  email: String!
  createdAt: DateTime!
  posts: [Post!]!          # non-null list of non-null Posts
}

type Post {
  id: ID!
  title: String!
  body: String
  author: User!
  tags: [String!]!
}

Special Root Types¶

type Query {
  user(id: ID!): User
  users(limit: Int = 20, offset: Int = 0): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

type Subscription {
  userCreated: User!
  messageReceived(roomId: ID!): Message!
}

Other Type Categories¶

Type	Purpose	Example
Input	Arguments to mutations	`input CreateUserInput { name: String!, email: String! }`
Enum	Fixed set of values	`enum Status { ACTIVE INACTIVE SUSPENDED }`
Interface	Shared fields across types	`interface Node { id: ID! }`
Union	Type can be one of many	`union SearchResult = User \\| Post \\| Comment`
Fragment	Reusable field selection	`fragment UserFields on User { id name email }`

Queries, Mutations, Subscriptions¶

Query — read data. Resolvers can be called in parallel:

query GetDashboard {
  currentUser {
    name
    notifications(unread: true) { id title }
  }
  trending { title views }
}

Mutation — write data. Resolvers execute sequentially:

mutation CreatePost($input: CreatePostInput!) {
  createPost(input: $input) {
    id
    title
    author { name }
  }
}

Subscription — real-time data via WebSocket (typically). Server pushes updates when events occur:

subscription OnMessageReceived($roomId: ID!) {
  messageReceived(roomId: $roomId) {
    id body sender { name } sentAt
  }
}

Resolvers¶

Resolvers are functions that produce data for each field in the schema. GraphQL execution is a depth-first traversal of the query tree — each field resolver receives:

parent — resolved value of the parent field
args — arguments passed to this field
context — shared object (DB connection, auth user, DataLoaders)
info — query metadata (field name, selection set, schema)

const resolvers = {
  Query: {
    user: async (_, { id }, { db }) => db.users.findById(id),
    users: async (_, { limit, offset }, { db }) =>
      db.users.findAll({ limit, offset }),
  },
  User: {
    // Parent resolver returned a user object; now resolve its posts field
    posts: async (user, { limit }, { db }) =>
      db.posts.findByUserId(user.id, limit),
  },
  Mutation: {
    createUser: async (_, { input }, { db }) => db.users.create(input),
  },
};

The N+1 Problem¶

The most common GraphQL performance trap. Without optimization, resolving a list of N users and their posts triggers 1 + N queries:

Query: users(limit: 20)    → SELECT * FROM users LIMIT 20          (1 query)
  User[0].posts            → SELECT * FROM posts WHERE user_id = 1  (1 query)
  User[1].posts            → SELECT * FROM posts WHERE user_id = 2  (1 query)
  ...
  User[19].posts           → SELECT * FROM posts WHERE user_id = 20 (1 query)
                                                                TOTAL: 21 queries

Real-world impact compounds with nesting — posts fetching authors fetching their posts can generate hundreds of queries for a single GraphQL request.

DataLoader — The Solution¶

Facebook's DataLoader batches and caches loads within a single request using Node.js's event loop tick:

import DataLoader from 'dataloader';

// Created once per request (NOT per application startup)
const postsByUserLoader = new DataLoader(async (userIds: readonly string[]) => {
  // Single batch query: SELECT * FROM posts WHERE user_id IN (1, 2, ..., 20)
  const posts = await db.posts.findByUserIds(userIds);
  // Return results in same order as input keys
  return userIds.map(id => posts.filter(p => p.userId === id));
});

// In resolver — these 20 calls become ONE SQL query
const resolvers = {
  User: {
    posts: (user, _, { loaders }) =>
      loaders.postsByUser.load(user.id),  // batched automatically
  },
};

Result: 21 queries → 2 queries (one for users, one batch for all posts).

DataLoader Instance Per Request

Create a new DataLoader instance for each request. DataLoader caches results for the duration of a request — sharing across requests will serve stale data.

Directives¶

Directives annotate schema elements or control query execution:

type User {
  email: String! @deprecated(reason: "Use contactEmail instead")
  contactEmail: String!
  password: String! @auth(requires: ADMIN)  # custom directive
}

# Built-in execution directives:
query GetUser($showEmail: Boolean!) {
  user(id: 42) {
    name
    email @include(if: $showEmail)   # conditionally include field
    phone @skip(if: $skipPhone)      # conditionally skip field
  }
}

Introspection¶

GraphQL APIs are self-documenting — clients can query the schema itself:

{
  __schema {
    types { name kind }
  }
  __type(name: "User") {
    fields { name type { name kind } }
  }
}

Introspection powers tools like GraphiQL, Apollo Studio, and GraphQL Playground. Disable introspection in production for security-sensitive APIs.

Query Complexity and Depth Limiting¶

Without limits, a malicious client can craft exponentially expensive queries:

# Denial-of-service via deeply nested query:
{ user { friends { friends { friends { friends { ... } } } } } }

Protect with: - Depth limiting: reject queries deeper than N levels (graphql-depth-limit) - Complexity analysis: assign costs to fields; reject queries over a budget (graphql-validation-complexity) - Query whitelisting (persisted queries): only allow pre-approved queries in production

Federation¶

GraphQL Federation lets multiple teams own separate subgraphs that compose into a unified supergraph — one schema, one endpoint, distributed implementation.

┌─────────────────────────────────────────────┐
│           Apollo Router (Supergraph)         │
│     Single endpoint: POST /graphql           │
└────────┬──────────────┬──────────────────────┘
         │              │
   ┌─────▼─────┐  ┌─────▼──────┐  ┌──────────┐
   │  Users     │  │  Products  │  │  Orders  │
   │  Subgraph  │  │  Subgraph  │  │ Subgraph │
   │  (Team A)  │  │  (Team B)  │  │ (Team C) │
   └───────────┘  └────────────┘  └──────────┘

Key concepts: - Entities: types that can be extended across subgraphs, identified by a @key directive - __resolveReference: resolver that hydrates an entity from a key passed by the router - @external: field defined in another subgraph, referenced here - Each subgraph is independently deployable; the router composes them at query time

Federation vs Schema Stitching¶

Before Federation, schema stitching was the primary approach to composing multiple GraphQL services. They solve the same problem differently:

Dimension	Schema Stitching	Federation
Composition	Gateway merges schemas at runtime	Router composes via a supergraph schema
Type ownership	Gateway defines cross-service types	Each subgraph owns its types via `@key`
Coupling	Gateway knows about all subgraphs' internal types	Subgraphs are self-contained; router only knows entities
Deployment	Change in one subgraph may require gateway redeploy	Subgraphs deploy independently
Conflict resolution	Manual: gateway resolves field name conflicts	Automatic: `@override`, `@provides`, `@shareable` directives
Tooling	GraphQL Tools (`@graphql-tools/stitch`)	Apollo Router, Apollo Studio, Cosmo Router
Status	Still works; no longer recommended for new projects	Industry standard for multi-team GraphQL

When stitching still makes sense: small teams, legacy services being gradually migrated, or when you need to compose third-party GraphQL APIs you don't control (federation requires subgraphs to add @key directives).

Error Handling¶

GraphQL errors behave fundamentally differently from REST:

Partial responses — in REST, an error means the entire response fails. In GraphQL, individual fields can fail while the rest of the response succeeds:

{
  "data": {
    "user": {
      "name": "Alice",
      "email": "[email protected]",
      "creditScore": null
    }
  },
  "errors": [
    {
      "message": "Unauthorized to access creditScore",
      "locations": [{ "line": 5, "column": 5 }],
      "path": ["user", "creditScore"],
      "extensions": {
        "code": "UNAUTHORIZED",
        "classification": "AUTHORIZATION"
      }
    }
  ]
}

The data field contains whatever succeeded; errors contains what failed. The client must handle both.

Error extensions — the extensions field is the standard way to add machine-readable error metadata:

// Apollo Server — throw typed error with extensions
import { GraphQLError } from 'graphql';

throw new GraphQLError('Order not found', {
  extensions: {
    code: 'NOT_FOUND',
    http: { status: 404 },
    orderId: input.id,
    traceId: ctx.traceId,
  },
});

Error masking — in production, mask internal errors to prevent leaking implementation details:

// Apollo Server 4 — format error for production
const server = new ApolloServer({
  typeDefs,
  resolvers,
  formatError: (formattedError, error) => {
    // Log full error internally
    logger.error(error);
    // Return sanitized error to client
    if (formattedError.extensions?.code === 'INTERNAL_SERVER_ERROR') {
      return { message: 'Internal server error', extensions: { code: 'INTERNAL_SERVER_ERROR' } };
    }
    return formattedError;
  },
});

Error classification patterns:

Code	Meaning	HTTP Equivalent
`BAD_USER_INPUT`	Invalid query variables	400
`UNAUTHENTICATED`	Missing or invalid auth	401
`FORBIDDEN`	Authenticated but not authorized	403
`NOT_FOUND`	Resource doesn't exist	404
`GRAPHQL_VALIDATION_FAILED`	Query doesn't match schema	400
`PERSISTED_QUERY_NOT_FOUND`	Unknown query hash (APQ miss)	400
`INTERNAL_SERVER_ERROR`	Unhandled server error	500

Caching¶

GraphQL caching is fundamentally harder than REST caching because requests use POST with dynamic query bodies — HTTP caches can't distinguish between different queries to the same /graphql endpoint.

HTTP-level caching (limited): - GET requests for queries: GET /graphql?query={user(id:42){name}}&variables={} — cacheable by CDN, but URL length limits apply - Automatic Persisted Queries (APQ) solve this: GET /graphql?extensions={"persistedQuery":{"sha256Hash":"abc..."}}&variables={"id":"42"} — short, cacheable, CDN-friendly

Client-side normalized caching (Apollo Client):

Apollo Client maintains an in-memory normalized cache keyed by __typename:id:

Cache store:
  User:42  → { __typename: "User", id: "42", name: "Alice", email: "[email protected]" }
  Post:7   → { __typename: "Post", id: "7", title: "Hello", author: { __ref: "User:42" } }
  Post:8   → { __typename: "Post", id: "8", title: "World", author: { __ref: "User:42" } }

When a mutation updates User:42, every query displaying that user re-renders automatically — no manual cache invalidation. This is the primary DX advantage of GraphQL over REST for complex frontends.

Cache policies:

Policy	Behavior	Use Case
`cache-first`	Read from cache; network only on miss	Default; best for mostly-static data
`network-only`	Always fetch; update cache	Dashboards, real-time displays
`cache-and-network`	Return cache immediately, then update with network	Instant UI + fresh data
`no-cache`	Fetch without reading or updating cache	One-off queries, sensitive data

Server-side caching: - Response-level: cache full GraphQL responses keyed by query hash + variables (Redis) - Resolver-level: cache individual resolver results (DataLoader already provides per-request caching; add Redis for cross-request caching) - @cacheControl directive (Apollo): per-field cache hints

type Product @cacheControl(maxAge: 3600) {
  id: ID!
  name: String!
  price: Float! @cacheControl(maxAge: 60)    # price changes more often
  reviews: [Review!]! @cacheControl(maxAge: 300)
}

gRPC¶

gRPC (Google Remote Procedure Call) is a high-performance, open-source RPC framework that uses Protocol Buffers as its interface definition language and serialization format, and HTTP/2 as the transport protocol. A CNCF project since 2016.

Protocol Buffers (Protobuf)¶

Protobuf is a language-neutral, platform-neutral binary serialization format. Compared to JSON:

Property	JSON	Protobuf
Format	Text (UTF-8)	Binary
Size	~1x baseline	3–10x smaller
Parse speed	~1x baseline	5–10x faster
Schema	Optional (JSON Schema)	Required (`.proto` file)
Human-readable	✅	❌ (need tools)
Schema evolution	Manual / fragile	Built-in field numbering

A .proto service definition:

syntax = "proto3";
package com.example.users;

// Message types
message User {
  string id        = 1;
  string name      = 2;
  string email     = 3;
  int64  created_at = 4;
}

message GetUserRequest  { string user_id = 1; }
message CreateUserRequest {
  string name  = 1;
  string email = 2;
}
message UserList { repeated User users = 1; }

// Service definition
service UserService {
  // Unary
  rpc GetUser(GetUserRequest) returns (User);

  // Server streaming
  rpc ListUsers(ListUsersRequest) returns (stream User);

  // Client streaming
  rpc CreateUsersBulk(stream CreateUserRequest) returns (UserList);

  // Bidirectional streaming
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

The protoc compiler generates strongly-typed client stubs and server interfaces in Go, Java, Python, C++, Node.js, Rust, Kotlin, Swift, and more.

HTTP/2 Features Exploited by gRPC¶

HTTP/2 Feature	What It Enables
Multiplexing	Multiple RPC calls on one TCP connection; no head-of-line blocking between requests
Binary framing	Headers and data sent as binary frames — more efficient than HTTP/1.1 text headers
Header compression (HPACK)	Repeated headers (auth token, content-type) sent as index references after first use; 85–90% header reduction
Full-duplex streams	Client and server can send frames simultaneously on the same stream
Flow control	Prevents fast producers from overwhelming slow consumers per-stream
Server push	Server can pre-emptively send resources (rarely used in gRPC)

The Four Streaming Types¶

Unary RPC¶

rpc GetUser(GetUserRequest) returns (User);

Classic request-response. Client sends one message, server sends one message. Equivalent to a REST GET.

Server Streaming RPC¶

rpc WatchLogs(WatchRequest) returns (stream LogEntry);

Client sends one request; server streams multiple responses. Useful for: live logs, large dataset export, real-time feeds.

Client Streaming RPC¶

rpc UploadMetrics(stream MetricPoint) returns (UploadSummary);

Client streams multiple messages; server collects them and returns one response. Useful for: telemetry ingestion, file uploads chunked by the client, batch writes.

Bidirectional Streaming RPC¶

rpc BidirectionalChat(stream ChatMessage) returns (stream ChatMessage);

Both sides can send and receive messages in any order over a long-lived connection. Both streams operate independently. Useful for: chat, collaborative editing, real-time games, audio/video signaling.

Deadlines and Cancellation¶

Every gRPC call should set a deadline — the absolute time by which the client requires a response. The server checks whether the deadline has been exceeded before starting expensive work.

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "42"})

Deadlines propagate through the entire call chain — if service A calls service B calls service C, all three respect the same deadline window, preventing one slow downstream call from causing timeouts at every layer.

Interceptors¶

Interceptors wrap gRPC method invocations — the gRPC equivalent of middleware:

// Unary server interceptor for logging
func loggingInterceptor(ctx context.Context, req interface{},
  info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (interface{}, error) {
  start := time.Now()
  resp, err := handler(ctx, req)
  log.Printf("Method: %s | Duration: %v | Error: %v",
    info.FullMethod, time.Since(start), err)
  return resp, err
}

// Register:
s := grpc.NewServer(
  grpc.UnaryInterceptor(loggingInterceptor),
  grpc.StreamInterceptor(streamLoggingInterceptor),
)

Common interceptors: authentication, tracing (OpenTelemetry), logging, metrics, panic recovery, rate limiting, deadline enforcement.

Load Balancing¶

Because gRPC multiplexes many RPCs over a single TCP connection, L4 (TCP) load balancing distributes connections, not RPCs. A single long-lived connection from service A to a single pod of service B bypasses all other pods.

Solutions: - L7 (application-layer) load balancing — proxy understands HTTP/2 streams and distributes individual RPCs: Envoy, nginx, gRPC-aware load balancers - Client-side load balancing — the gRPC client resolves all backend IPs (via DNS), maintains connections to each, and distributes RPCs itself - Headless services in Kubernetes — returns all pod IPs; combined with gRPC client-side round-robin

gRPC-Web (Browser Bridge)¶

Browsers cannot make native HTTP/2 gRPC calls (no access to HTTP/2 frames or trailers). gRPC-Web bridges this gap with a protocol translation proxy.

flowchart LR
    B[Browser\ngRPC-Web Client] -->|HTTP/1.1 or HTTP/2\nContent-Type: application/grpc-web| P[Envoy Proxy\ngRPC-Web Filter]
    P -->|Native HTTP/2 gRPC| S[gRPC Server]

How it works: 1. Browser client uses @grpc/grpc-web or connect-web to make gRPC calls 2. Calls are encoded as application/grpc-web (base64 or binary) over standard HTTP 3. Envoy proxy (or Connect protocol server) translates to native gRPC 4. Server sees standard gRPC requests — no code changes needed

// Browser client using Connect (modern alternative to grpc-web)
import { createClient } from "@connectrpc/connect";
import { createGrpcWebTransport } from "@connectrpc/connect-web";
import { UserService } from "./gen/users_connect";

const transport = createGrpcWebTransport({
  baseUrl: "https://api.example.com",
});

const client = createClient(UserService, transport);
const user = await client.getUser({ userId: "42" });

gRPC-Web limitations: - Only unary and server-streaming RPCs (no client-streaming or bidirectional) - Requires a proxy (Envoy, Connect, nginx) unless using Connect protocol natively - Slightly higher latency due to protocol translation

Connect protocol (from Buf) is the modern alternative: supports gRPC, gRPC-Web, and a new Connect protocol natively — all three over a single HTTP endpoint, with browser support without a proxy for the Connect wire format.

SOAP / XML-RPC¶

SOAP (Simple Object Access Protocol) is the predecessor to REST. Still deeply embedded in enterprise systems, financial services, healthcare (HL7), and government integrations.

Protocol Structure¶

A SOAP message is an XML document with a mandatory Envelope, optional Header, and mandatory Body:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
  xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:usr="http://example.com/users">
  <soap:Header>
    <usr:AuthToken>abc123</usr:AuthToken>
  </soap:Header>
  <soap:Body>
    <usr:GetUser>
      <usr:UserId>42</usr:UserId>
    </usr:GetUser>
  </soap:Body>
</soap:Envelope>

WSDL (Web Services Description Language)¶

WSDL is SOAP's IDL — an XML document that describes the service completely: operations, input/output message types, bindings (how operations map to protocols), and endpoints. It serves the same role as OpenAPI for REST or .proto files for gRPC.

<wsdl:definitions name="UserService" ...>
  <wsdl:types>
    <xs:schema>
      <xs:element name="GetUserRequest">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="UserId" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
  </wsdl:types>
  <wsdl:message name="GetUserInput">
    <wsdl:part name="parameters" element="tns:GetUserRequest"/>
  </wsdl:message>
  <wsdl:portType name="UserServicePortType">
    <wsdl:operation name="GetUser">
      <wsdl:input message="tns:GetUserInput"/>
      <wsdl:output message="tns:GetUserOutput"/>
    </wsdl:operation>
  </wsdl:portType>
</wsdl:definitions>

SOAP vs REST¶

Dimension	SOAP	REST
Payload	XML (verbose)	JSON (compact)
Contract	WSDL (machine-readable)	OpenAPI (optional)
Transport	HTTP, SMTP, TCP	HTTP only
State	Stateful or stateless	Stateless
Security	WS-Security (powerful but complex)	OAuth 2.0, JWT, mTLS
Error handling	`soap:Fault` (standardized)	HTTP status codes (convention-based)
Tooling	Mature but heavy	Light and universal
Still used for	Banking, insurance, health (HL7), government	Virtually everything new

XML-RPC predates SOAP — a simpler, less extensible ancestor using XML payloads over HTTP POST. Effectively obsolete.

WebSocket¶

WebSocket provides a persistent, full-duplex TCP connection between client and server, established via an HTTP upgrade handshake. Once established, either side can send messages at any time with minimal overhead.

Handshake¶

# Client initiates upgrade:
GET /ws HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

# Server confirms upgrade:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the handshake, the connection is no longer HTTP. Data flows as frames — the minimal overhead unit:

Frame Type	Description
Text frame	UTF-8 text message
Binary frame	Raw bytes (audio, video, protobuf)
Ping frame	Heartbeat probe (server → client)
Pong frame	Heartbeat response
Close frame	Graceful connection termination

Connection Management¶

The primary operational challenge of WebSocket is connection state management:

Heartbeats (ping/pong): detect dead connections that appear open at the TCP layer. Servers should send pings every 30–60s; if no pong arrives, close and clean up.
Reconnection: clients should implement exponential backoff when the connection drops. Libraries like reconnecting-websocket handle this automatically.
Backpressure: if a slow client can't consume fast enough, the server's send buffer fills. Monitor ws.bufferedAmount on the client, or implement application-level flow control.
Horizontal scaling: WebSocket connections are stateful and sticky. A message sent by user A (connected to server 1) destined for user B (connected to server 2) must be routed between servers via a pub/sub layer (Redis Pub/Sub, Kafka).

When to Use WebSocket¶

Interactive real-time features: chat, collaborative document editing, multiplayer gaming
Financial data: live order books, tick-by-tick price feeds
IoT: bidirectional device control with low latency
When the client sends frequent data to the server (>1 msg/second)

Server-Sent Events (SSE)¶

SSE is a W3C standard for server-to-client streaming over plain HTTP. Unlike WebSocket, there is no protocol upgrade — it's just a long-lived HTTP response with Content-Type: text/event-stream.

Protocol¶

Server response:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

id: 1
event: message
data: {"type": "notification", "text": "Hello!"}

id: 2
event: update
data: {"user": "alice", "status": "online"}

: heartbeat comment (ignored by client)

SSE message fields: | Field | Purpose | |---|---| | data: | The message payload (can span multiple lines) | | event: | Custom event type (client listens via addEventListener) | | id: | Message ID; sent as Last-Event-ID header on reconnect | | : (comment) | Ignored by client; used for keepalive pings |

Auto-Reconnection¶

SSE's killer feature: if the connection drops, the browser automatically reconnects and sends the Last-Event-ID header — the server can resume from where it left off. No client code required.

const source = new EventSource('/events');

source.addEventListener('message', e => console.log(e.data));
source.addEventListener('update', e => handleUpdate(JSON.parse(e.data)));
source.onerror = e => console.error('SSE error', e);
// Reconnection happens automatically — no manual retry logic needed

HTTP/2 SSE¶

Under HTTP/1.1, browsers limit each domain to 6 connections. With 7 tabs open, SSE connections compete with XHR/fetch requests. Under HTTP/2, all SSE streams multiplex over a single TCP connection — this limit disappears entirely.

AI Streaming

SSE is the standard for LLM token streaming. OpenAI, Anthropic, and virtually all LLM APIs stream completions via SSE because data flows in one direction (server → client), SSE is simpler than WebSocket, and auto-reconnect handles transient failures gracefully.

Webhooks¶

Webhooks are HTTP POST callbacks — the server pushes events to client-registered URLs instead of the client polling for changes. "Don't call us, we'll call you."

Flow¶

sequenceDiagram
    participant Client
    participant YourServer
    participant WebhookConsumer

    Client->>YourServer: Register webhook URL
    Note over YourServer: Event occurs (payment, commit, signup)
    YourServer->>WebhookConsumer: POST /webhook {"event": "payment.succeeded", ...}
    WebhookConsumer-->>YourServer: 200 OK (within 5s)
    Note over WebhookConsumer: Queue event for async processing

Production Webhook Pattern¶

Respond immediately, process asynchronously:

@app.post("/webhook")
async def webhook_handler(request: Request):
    payload = await request.json()
    # 1. Validate signature FIRST
    verify_signature(request.headers, payload)
    # 2. Return 200 immediately — before any processing
    background_tasks.add_task(process_event, payload)
    return {"status": "accepted"}

Never do slow work (DB queries, API calls) in the webhook handler. Return 200 within 5 seconds or the sender will retry.

Security: Signature Verification¶

Every webhook provider should sign payloads. Verify before processing:

import hmac, hashlib

def verify_signature(headers: dict, body: bytes, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    received = headers.get("X-Signature-256", "").removeprefix("sha256=")
    return hmac.compare_digest(expected, received)

Reliability Patterns¶

Pattern	Purpose
Idempotency key	Deduplicate retried deliveries — store processed event IDs
Exponential backoff retries	Sender retries on non-2xx: immediately, 5s, 30s, 5m, 30m, 2h
Dead letter queue	After N retries, move to DLQ for manual inspection
Event replay	Allow consumers to re-request past events by ID
CloudEvents format	Standard envelope: `id`, `source`, `type`, `time`, `data`

tRPC¶

tRPC lets TypeScript full-stack teams build APIs where type safety flows automatically from server to client — no code generation, no schema files, no out-of-sync types.

How It Works¶

Define procedures on the server (TypeScript functions)
Export the router's type
Import and use that type on the client
TypeScript infers input/output types automatically

The client never imports server implementation code — only the type. At runtime, tRPC serializes calls over HTTP (queries → GET/POST, mutations → POST, subscriptions → WebSocket).

Routers and Procedures¶

// server/routers/users.ts
import { z } from 'zod';
import { router, publicProcedure, protectedProcedure } from '../trpc';

export const userRouter = router({
  // Query — GET /trpc/users.getById
  getById: publicProcedure
    .input(z.object({ id: z.string() }))
    .query(async ({ input, ctx }) => {
      return ctx.db.user.findUnique({ where: { id: input.id } });
    }),

  // Mutation — POST /trpc/users.create
  create: protectedProcedure
    .input(z.object({ name: z.string(), email: z.string().email() }))
    .mutation(async ({ input, ctx }) => {
      return ctx.db.user.create({ data: input });
    }),
});

// server/routers/_app.ts
export const appRouter = router({
  users: userRouter,
  posts: postRouter,
  comments: commentRouter,
});

export type AppRouter = typeof appRouter;  // ← this is all the client needs

Client Usage¶

// client/trpc.ts
import { createTRPCReact } from '@trpc/react-query';
import type { AppRouter } from '../server/routers/_app';

export const trpc = createTRPCReact<AppRouter>();

// In a React component:
function UserProfile({ userId }: { userId: string }) {
  // Fully typed: input, output, error — all inferred from server code
  const { data, isLoading } = trpc.users.getById.useQuery({ id: userId });
  // data is typed as: User | null | undefined
  // Change server return type → TypeScript error here immediately
}

Context and Middleware¶

// Context: per-request shared state (auth user, DB, etc.)
export const createContext = async ({ req, res }: CreateNextContextOptions) => ({
  db: prisma,
  session: await getSession({ req }),
});

// Middleware: wraps procedures with reusable logic
const isAuthenticated = middleware(({ ctx, next }) => {
  if (!ctx.session?.user) throw new TRPCError({ code: 'UNAUTHORIZED' });
  return next({ ctx: { ...ctx, user: ctx.session.user } });
});

// Protected procedure: any procedure using this is automatically auth-gated
const protectedProcedure = publicProcedure.use(isAuthenticated);

tRPC vs Alternatives¶

Dimension	tRPC	REST + OpenAPI	GraphQL
Type safety	✅ Automatic, zero-gen	⚠️ Code generation required	⚠️ Code generation required
Language support	TypeScript/JS only	Universal	Universal
Schema file	❌ None (types are the schema)	OpenAPI YAML/JSON	`.graphql` SDL
Learning curve	Low (just TypeScript)	Low	High
Client flexibility	❌ Must use tRPC client	✅ Any HTTP client	✅ Any GraphQL client
Over/under-fetching	Field selection not built-in	Full response always	✅ Client specifies fields
Best for	TypeScript monorepos (T3 stack, Next.js)	Public APIs, polyglot	Complex multi-client frontends

Choosing the Right API Paradigm¶

Is this a public API consumed by external developers or third parties?
→ REST (universal, familiar, broad tooling)

Is the frontend complex with multiple clients fetching different data shapes?
→ GraphQL (eliminates over/under-fetching, empowers frontend teams)

Is this internal service-to-service communication with high throughput?
→ gRPC (fastest, binary, streaming support, code-gen clients)

Does the data need to flow in real time in both directions?
→ WebSocket (full-duplex, persistent)

Does the server push updates to passive clients (feeds, notifications)?
→ SSE (simpler than WebSocket, HTTP-native, auto-reconnect)

Is the entire stack TypeScript and owned by one team?
→ tRPC (zero boilerplate, type-safe end-to-end)

Does an external system need to notify you when events occur?
→ Webhooks (event-driven push, polling eliminated)

Is this a legacy enterprise or regulated domain (banking, healthcare)?
→ SOAP (accept the complexity; interoperability with existing systems)

It Is Not Either-Or

Real systems commonly use multiple paradigms together: a public REST API for external consumers, gRPC internally between microservices, GraphQL for the customer-facing frontend, WebSocket for real-time features, and webhooks for third-party integrations.

HTTP/2 and HTTP/3 (QUIC)¶

All modern API protocols ride on top of HTTP — understanding transport evolution is essential.

HTTP/2 (2015, RFC 7540)¶

HTTP/2 is the minimum transport for gRPC and significantly improves REST/GraphQL performance.

Feature	HTTP/1.1	HTTP/2
Framing	Text-based	Binary frames
Multiplexing	❌ (one request per TCP connection)	✅ Multiple streams per connection
Header compression	❌	✅ HPACK
Server push	❌	✅ (rarely used in practice)
Connection limit	6 per origin (browser)	1 TCP connection, unlimited streams
Head-of-line blocking	✅ At HTTP layer	❌ At HTTP layer — but YES at TCP layer

The TCP head-of-line blocking problem: if a single TCP packet is lost, ALL HTTP/2 streams on that connection stall until retransmission completes. This is the fundamental limitation HTTP/3 solves.

HTTP/3 (2022, RFC 9114)¶

HTTP/3 replaces TCP with QUIC (UDP-based transport with built-in TLS 1.3).

graph TB
    subgraph "HTTP/2 Stack"
        H2[HTTP/2] --> TLS2[TLS 1.2/1.3]
        TLS2 --> TCP[TCP]
        TCP --> IP1[IP]
    end
    subgraph "HTTP/3 Stack"
        H3[HTTP/3] --> QUIC[QUIC\nbuilt-in TLS 1.3]
        QUIC --> UDP[UDP]
        UDP --> IP2[IP]
    end

Key improvements:

Feature	HTTP/2 (TCP)	HTTP/3 (QUIC)
Head-of-line blocking	✅ TCP-level HOL	❌ Independent streams per QUIC stream
Connection setup	TCP handshake + TLS handshake (2–3 RTT)	0-RTT or 1-RTT (TLS built into QUIC)
Connection migration	❌ New connection on network change	✅ Connection ID survives IP change
Packet loss recovery	Entire connection stalls	Only affected stream pauses
Congestion control	Kernel TCP (cubic/bbr)	User-space (pluggable, per-connection)

Connection migration is particularly impactful for mobile APIs: when a phone switches from WiFi to cellular, HTTP/2 drops the TCP connection and must re-handshake. HTTP/3's connection ID persists across network changes — the connection continues seamlessly.

0-RTT resumption: returning clients can send data in the very first packet by reusing a previously negotiated TLS session. Crucial for latency-sensitive API calls on mobile networks.

0-RTT Replay Risk

0-RTT data can be replayed by a network attacker. Only use 0-RTT for idempotent operations (GET). Non-idempotent operations (POST) should wait for the full handshake.

gRPC and HTTP/3: gRPC currently requires HTTP/2. Experimental gRPC-over-QUIC implementations exist (e.g., quic-go), but the gRPC specification does not officially support HTTP/3 yet. When it does, the independent-stream property of QUIC will eliminate the head-of-line blocking that currently affects multiplexed gRPC connections.

Content Negotiation¶

Content negotiation lets client and server agree on response format:

# Client requests JSON, can accept XML as fallback
GET /v2/orders/42 HTTP/1.1
Accept: application/json, application/xml;q=0.9, */*;q=0.1
Accept-Language: en-US, fr;q=0.5
Accept-Encoding: gzip, br

# Server responds with chosen representation
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Language: en-US
Content-Encoding: br
Vary: Accept, Accept-Language, Accept-Encoding

The Vary header tells caches which request headers affect the response — critical for correct caching behavior.

API versioning via content negotiation:

Accept: application/vnd.example.v2+json

This is the most RESTful versioning approach (no URL pollution) but less discoverable than URI versioning.

Architectural Patterns¶

Backend for Frontend (BFF)¶

The BFF pattern creates a dedicated API gateway per client type — each frontend gets an API layer optimized for its specific data needs.

flowchart LR
    subgraph Clients
        M[Mobile App]
        W[Web App]
        TV[Smart TV]
    end
    subgraph BFF Layer
        MB[Mobile BFF\nGo / Node.js]
        WB[Web BFF\nNode.js]
        TB[TV BFF\nNode.js]
    end
    subgraph Backend Services
        US[User Service]
        PS[Product Service]
        OS[Order Service]
    end

    M --> MB
    W --> WB
    TV --> TB
    MB --> US & PS & OS
    WB --> US & PS & OS
    TB --> US & PS

Why BFF over a single gateway: - Mobile needs minimal payloads; web needs rich data — one API can't optimize for both - Each BFF aggregates multiple backend calls into one client-optimized response - Teams can deploy BFFs independently; breaking a mobile BFF doesn't affect web - Authentication/session management can differ per client type

BFF vs GraphQL: GraphQL solves the over/under-fetching problem with client-specified queries, potentially eliminating the need for separate BFFs. However, BFF is still valuable when: - Clients need significantly different business logic (not just different fields) - The team wants to contain complexity behind a simple REST API per client - Backend services expose gRPC — the BFF translates to REST/JSON for browser clients

GraphQL Persisted Queries¶

Persisted queries replace arbitrary client-sent GraphQL strings with pre-registered query IDs — improving security, performance, and bandwidth.

# Without persisted queries — client sends full query string
POST /graphql
{"query": "query GetUser($id: ID!) { user(id: $id) { name email posts { title } } }", "variables": {"id": "42"}}

# With persisted queries — client sends only the hash
POST /graphql
{"extensions": {"persistedQuery": {"version": 1, "sha256Hash": "ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4062f09b5bad56a6585"}}, "variables": {"id": "42"}}

Automatic persisted queries (APQ) flow (Apollo): 1. Client sends query hash only 2. If server doesn't recognize the hash → returns PersistedQueryNotFound 3. Client retries with full query string + hash 4. Server stores the mapping; subsequent requests use hash only

Benefits: - Security: in locked-down mode, server rejects any query not in the allowlist — prevents arbitrary query attacks - Bandwidth: hash (64 chars) replaces potentially multi-KB query strings - CDN caching: hash-based GET requests are cacheable at edge (GET /graphql?extensions={...}&variables={...})

gRPC Health Checking Protocol¶

gRPC defines a standardized health checking protocol (grpc.health.v1) for load balancers and orchestrators:

syntax = "proto3";
package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckRequest {
  string service = 1;  // empty string = overall server health
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}

# Check health with grpcurl
grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check

# Check specific service
grpcurl -plaintext -d '{"service": "orders.OrderService"}' \
  localhost:50051 grpc.health.v1.Health/Check

# Kubernetes gRPC health probe (k8s 1.24+)
# In pod spec:
# livenessProbe:
#   grpc:
#     port: 50051
#     service: ""

gRPC Server Reflection¶

Server reflection allows tools like grpcurl to discover services without .proto files — the gRPC equivalent of OpenAPI's /swagger.json:

// Enable reflection in Go gRPC server
import "google.golang.org/grpc/reflection"

s := grpc.NewServer()
pb.RegisterOrderServiceServer(s, &server{})
reflection.Register(s)  // enables runtime schema discovery

# Discover all services (requires reflection)
grpcurl -plaintext localhost:50051 list

# Describe a specific service
grpcurl -plaintext localhost:50051 describe orders.OrderService

# Describe a message type
grpcurl -plaintext localhost:50051 describe orders.Order

Disable Reflection in Production

Like GraphQL introspection, gRPC reflection exposes your entire API surface. Disable it in production or restrict to authorized callers only.

API Performance Patterns¶

Request Compression¶

# Client sends compressed body
POST /v2/orders HTTP/1.1
Content-Encoding: gzip
Content-Type: application/json

# Client requests compressed response
GET /v2/orders HTTP/1.1
Accept-Encoding: gzip, br

Brotli (br) achieves 15–25% better compression than gzip for JSON/text payloads, but requires more CPU for compression. Most CDNs pre-compress static assets with Brotli. For dynamic API responses, gzip is usually the better trade-off (faster compression, slightly larger output).

Connection Pooling¶

HTTP/1.1 clients should maintain a connection pool to avoid the overhead of TCP+TLS handshakes per request:

Setting	Typical Value	Notes
Pool size (per host)	20–100	Match to expected concurrency
Idle timeout	30–90s	Close idle connections to free resources
Max lifetime	5–10 min	Prevent sticky connections to a single backend
Health check interval	10s	Detect dead connections proactively

HTTP/2 clients typically use a single connection per host with unlimited streams — connection pooling is less critical but still relevant for fault tolerance (maintain 2–3 connections).

ETag-Based Conditional Requests¶

First request:
  GET /v2/orders/42 → 200 OK, ETag: "abc123"

Subsequent request:
  GET /v2/orders/42
  If-None-Match: "abc123"
  → 304 Not Modified (no body, use cached copy)
  → 200 OK + new ETag (resource changed, here is new version)

ETags reduce bandwidth and server load. For mutable resources, use strong ETags (exact byte-for-byte match). For semantic equivalence, use weak ETags (W/"abc123").

Async Request Collapsing (Request Deduplication)¶

When multiple clients request the same resource simultaneously, collapse them into a single backend request:

Time T=0:  Client A → GET /products/42
Time T=1ms: Client B → GET /products/42  (same key, collapse)
Time T=2ms: Client C → GET /products/42  (same key, collapse)
Time T=50ms: Backend returns → fan out to A, B, C

Result: 1 backend call instead of 3

Implemented in: Nginx (proxy_cache_lock), Varnish (grace mode), CloudFlare, Envoy.

Benchmarks: Protocol Performance¶

Approximate comparisons under controlled conditions. Real-world performance depends heavily on payload, network, and implementation.

Metric	REST (JSON/HTTP2)	GraphQL (JSON/HTTP2)	gRPC (Protobuf/HTTP2)
Serialization size (1KB logical payload)	~1.2 KB	~1.0 KB (no over-fetching)	~0.4 KB
Serialization time	~1x baseline	~1x	~0.1–0.3x (binary)
Latency (unary, same DC)	~1–5ms	~2–8ms (resolver overhead)	~0.5–2ms
Throughput (single connection)	Limited by HTTP/1.1 HOL	Same as REST	Higher (multiplexed, binary)
Browser support	✅ Native	✅ Native	⚠️ grpc-web proxy required
Streaming	❌ (SSE for server-push)	✅ Subscriptions (WS)	✅ 4 streaming types

When Performance Matters Less

For most CRUD APIs, the difference between REST and gRPC latency is negligible compared to database query time. Choose the paradigm based on developer experience and client requirements, not raw protocol speed — unless you're building a low-latency trading system or processing millions of internal RPCs per second.