Web Services Architecture¶
Deep dive into every major API paradigm — how each protocol works under the hood, when to use each, and how they compare.
Protocol Comparison Overview¶
graph TD
A[Client needs data] --> B{Use case?}
B -->|Public API, CRUD, browser-native| C[REST]
B -->|Flexible queries, complex frontends| D[GraphQL]
B -->|Internal service-to-service, streaming| E[gRPC]
B -->|Real-time bidirectional| F[WebSocket]
B -->|Server pushes only, notifications| G[SSE]
B -->|TypeScript full-stack only| H[tRPC]
B -->|Event notification to external systems| I[Webhooks]
B -->|Legacy enterprise integration| J[SOAP]
| Protocol | Transport | Format | Direction | Browser Native | Best For |
|---|---|---|---|---|---|
| REST | HTTP/1.1, HTTP/2 | JSON (typically) | Req/Res | ✅ | Public APIs, CRUD, resource modeling |
| GraphQL | HTTP/1.1, HTTP/2 | JSON | Req/Res + Subscription | ✅ | Complex frontends, data aggregation |
| gRPC | HTTP/2 only | Protocol Buffers (binary) | Req/Res + Streaming | ⚠️ (needs proxy) | Internal microservices, high-throughput |
| SOAP | HTTP, SMTP, TCP | XML | Req/Res | ✅ | Legacy enterprise, financial services |
| WebSocket | WS (TCP upgrade) | Any (text/binary) | Full-duplex | ✅ | Real-time chat, gaming, collaboration |
| SSE | HTTP/1.1, HTTP/2 | Text (UTF-8) | Server → Client only | ✅ | Feeds, notifications, AI streaming |
| Webhooks | HTTP POST | JSON (typically) | Server → Client push | ✅ | Event-driven integrations, automation |
| tRPC | HTTP/WebSocket | JSON | Req/Res + Subscription | ✅ (Node/TS only) | TypeScript full-stack monorepos |
REST (Representational State Transfer)¶
Roy Fielding defined REST in his 2000 doctoral dissertation as an architectural style — not a protocol — built on six constraints that, when applied together, produce a scalable, stateless, and cacheable web service.
The Six Architectural Constraints¶
1. Client–Server Separation¶
The client and server evolve independently. The server manages data storage and business logic; the client manages the user interface and user state. Neither depends on the other's implementation details — only the shared API contract.
This decoupling allows frontend teams to swap frameworks (React → Vue) or mobile clients to evolve, without requiring backend changes, and vice versa.
2. Stateless¶
Every request from client to server must contain all information necessary to understand and process the request. The server stores no session state between requests.
❌ Stateful (server stores session):
POST /login → server creates session, returns cookie
GET /dashboard → server reads session to identify user
✅ Stateless (client carries state):
GET /dashboard
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...
Consequences: - Scalability: any server instance can handle any request — no sticky sessions - Reliability: no session state to lose if a server crashes - Overhead: every request must carry auth credentials and context (larger payloads)
3. Cacheable¶
Responses must declare whether they are cacheable or not. When responses are cacheable, clients and intermediaries (CDNs, proxies) can serve them without contacting the server.
Key HTTP cache headers:
| Header | Purpose | Example |
|---|---|---|
| Cache-Control | Directives for caching behavior | Cache-Control: max-age=3600, public |
| ETag | Fingerprint of resource version | ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249" |
| Last-Modified | When resource last changed | Last-Modified: Tue, 22 Apr 2026 12:00:00 GMT |
| Vary | Which headers affect the cache key | Vary: Accept-Encoding, Authorization |
Conditional requests let clients validate their cache:
GET /users/42
If-None-Match: "d8e8fca2dc0f896fd7cb4cb0031ba249"
→ 304 Not Modified (body omitted — client uses cached copy)
→ 200 OK + new ETag + new body (cache miss — resource changed)
4. Uniform Interface¶
The single most important constraint. It defines four sub-principles:
4a. Resource Identification in Requests — every resource has a stable URI:
/users → collection of users
/users/42 → specific user
/users/42/orders → orders belonging to user 42
/users/42/orders/7/items → items in that order
4b. Manipulation via Representations — clients hold representations (JSON, XML, HTML), not live objects. The client modifies the representation and sends it back.
4c. Self-Descriptive Messages — each request/response carries enough metadata to describe how to process it: Content-Type, method, status code, cache directives.
4d. HATEOAS — see section below.
5. Layered System¶
Clients cannot tell whether they're connected directly to the server or an intermediary (load balancer, CDN, API gateway, caching proxy). Each layer only knows about the adjacent layer.
This enables transparent insertion of: - CDNs for caching at the edge - API gateways for auth, rate limiting, routing - Load balancers for distributing traffic - Service meshes for observability and mTLS
6. Code on Demand (optional)¶
The only optional constraint. Servers can temporarily extend client functionality by transferring executable code (e.g., JavaScript). Rarely relevant in modern API design.
HTTP Methods and Idempotency¶
| Method | Semantics | Idempotent | Safe | Common Use |
|---|---|---|---|---|
| GET | Retrieve resource(s) | ✅ | ✅ | Read data |
| HEAD | GET without body (check existence/metadata) | ✅ | ✅ | Cache validation |
| POST | Create a new resource; non-idempotent actions | ❌ | ❌ | Create, submit form, trigger action |
| PUT | Replace entire resource (upsert) | ✅ | ❌ | Full update |
| PATCH | Partial update | ❌* | ❌ | Partial update |
| DELETE | Remove resource | ✅ | ❌ | Delete |
| OPTIONS | Discover allowed methods (used for CORS preflight) | ✅ | ✅ | CORS |
* PATCH can be designed idempotently but is not required to be.
Safe = no side effects (read-only). Idempotent = making the same request N times has the same effect as making it once.
HTTP Status Codes¶
| Range | Category | Key Codes |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 202 Accepted, 204 No Content |
| 3xx | Redirection | 301 Moved Permanently, 304 Not Modified |
| 4xx | Client Error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable Entity, 429 Too Many Requests |
| 5xx | Server Error | 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, 504 Gateway Timeout |
Common Status Code Mistakes
- Never return
200 OKwith an error in the body — clients must parse every body to detect errors - Use
401for unauthenticated,403for authenticated but unauthorized - Use
422(not400) when the request is syntactically valid but semantically wrong (e.g. invalid field value) 404means "resource not found", not "I don't know" — don't use it as a catch-all
PATCH Semantics: JSON Patch vs JSON Merge Patch¶
PATCH is the most nuanced HTTP method. The two dominant formats behave very differently:
JSON Merge Patch (RFC 7396) — simple, intuitive; send only the fields you want to change:
PATCH /users/42 HTTP/1.1
Content-Type: application/merge-patch+json
{"email": "[email protected]", "phone": null}
Server merges the patch with the existing resource: email is updated, phone is removed (explicit null), all other fields are unchanged.
Limitation: you cannot set a field to null and leave it present — null always means "remove." This makes JSON Merge Patch unusable for APIs where null is a meaningful value.
JSON Patch (RFC 6902) — explicit operations array, more powerful but more complex:
PATCH /users/42 HTTP/1.1
Content-Type: application/json-patch+json
[
{ "op": "replace", "path": "/email", "value": "[email protected]" },
{ "op": "remove", "path": "/phone" },
{ "op": "add", "path": "/addresses/1", "value": {"city": "Berlin"} },
{ "op": "test", "path": "/version", "value": 3 }
]
Operations: add, remove, replace, move, copy, test. The test operation enables optimistic concurrency — the patch fails atomically if the tested value doesn't match.
| Dimension | JSON Merge Patch | JSON Patch |
|---|---|---|
| RFC | 7396 | 6902 |
| Content-Type | application/merge-patch+json |
application/json-patch+json |
| Format | Partial JSON object | Array of operations |
| Set field to null | ❌ (null = remove) | ✅ {"op": "replace", "path": "/x", "value": null} |
| Array operations | Replace entire array only | Add/remove individual elements |
| Atomicity | No built-in check | test operation for optimistic locking |
| Complexity | Low — just send partial object | Higher — must construct operation array |
| Adoption | More common (GitHub, Stripe) | Less common; used when precision needed |
Practical Recommendation
Most APIs use JSON Merge Patch for simplicity. Use JSON Patch only when you need array element manipulation, optimistic concurrency via test, or the ability to distinguish "set to null" from "remove."
HATEOAS¶
Hypermedia as the Engine of Application State — the highest constraint of REST. Responses include hyperlinks that describe what actions are available next. Clients need no prior knowledge of URL structure; they navigate by following links.
{
"id": 42,
"name": "Alice",
"email": "[email protected]",
"_links": {
"self": { "href": "/users/42", "method": "GET" },
"orders": { "href": "/users/42/orders", "method": "GET" },
"update": { "href": "/users/42", "method": "PUT" },
"delete": { "href": "/users/42", "method": "DELETE" }
}
}
Benefits: API is self-documenting; server can change URL structure without breaking clients; workflow steps are discoverable.
In practice: very few production APIs implement full HATEOAS. Most APIs reach Level 2 of the Richardson Maturity Model (proper HTTP verbs) and stop there.
Richardson Maturity Model¶
A framework for measuring how RESTful an API actually is:
| Level | Name | What It Adds | Example |
|---|---|---|---|
| 0 | Swamp of POX | Single endpoint, single method | POST /api with XML body specifying action |
| 1 | Resources | Multiple URIs, but still single HTTP verb | POST /users, POST /users/42 |
| 2 | HTTP Verbs | Uses GET/POST/PUT/DELETE meaningfully | GET /users/42, DELETE /users/42 |
| 3 | Hypermedia | Responses contain links for navigation (HATEOAS) | JSON with _links section |
Roy Fielding stated that Level 3 is the pre-condition of REST. Most production APIs sit at Level 2 — which is fine for practical purposes, even if technically not "truly RESTful."
GraphQL¶
Facebook created GraphQL in 2012 and open-sourced it in 2015. It is a query language for your API and a runtime for executing those queries — giving clients the power to ask for exactly what they need and nothing more.
Core Concept: Single Endpoint¶
Unlike REST's resource-per-endpoint model, GraphQL exposes a single endpoint (typically POST /graphql) that accepts queries describing the exact shape of data needed.
# REST requires 3 round trips:
# GET /users/42
# GET /users/42/posts
# GET /posts/7/comments
# GraphQL fetches all in one request:
query {
user(id: 42) {
name
email
posts(limit: 5) {
title
publishedAt
comments(limit: 3) {
body
author { name }
}
}
}
}
Type System and Schema¶
Everything in GraphQL is strongly typed. The schema is the single source of truth — it describes every piece of data the API can return and every operation clients can perform.
Scalar Types¶
Built-in primitives: Int, Float, String, Boolean, ID. Custom scalars can be defined (e.g., DateTime, URL, JSON).
Object Types¶
type User {
id: ID! # ! = non-nullable
name: String!
email: String!
createdAt: DateTime!
posts: [Post!]! # non-null list of non-null Posts
}
type Post {
id: ID!
title: String!
body: String
author: User!
tags: [String!]!
}
Special Root Types¶
type Query {
user(id: ID!): User
users(limit: Int = 20, offset: Int = 0): [User!]!
}
type Mutation {
createUser(input: CreateUserInput!): User!
updateUser(id: ID!, input: UpdateUserInput!): User!
deleteUser(id: ID!): Boolean!
}
type Subscription {
userCreated: User!
messageReceived(roomId: ID!): Message!
}
Other Type Categories¶
| Type | Purpose | Example |
|---|---|---|
| Input | Arguments to mutations | input CreateUserInput { name: String!, email: String! } |
| Enum | Fixed set of values | enum Status { ACTIVE INACTIVE SUSPENDED } |
| Interface | Shared fields across types | interface Node { id: ID! } |
| Union | Type can be one of many | union SearchResult = User \| Post \| Comment |
| Fragment | Reusable field selection | fragment UserFields on User { id name email } |
Queries, Mutations, Subscriptions¶
Query — read data. Resolvers can be called in parallel:
query GetDashboard {
currentUser {
name
notifications(unread: true) { id title }
}
trending { title views }
}
Mutation — write data. Resolvers execute sequentially:
mutation CreatePost($input: CreatePostInput!) {
createPost(input: $input) {
id
title
author { name }
}
}
Subscription — real-time data via WebSocket (typically). Server pushes updates when events occur:
subscription OnMessageReceived($roomId: ID!) {
messageReceived(roomId: $roomId) {
id body sender { name } sentAt
}
}
Resolvers¶
Resolvers are functions that produce data for each field in the schema. GraphQL execution is a depth-first traversal of the query tree — each field resolver receives:
parent— resolved value of the parent fieldargs— arguments passed to this fieldcontext— shared object (DB connection, auth user, DataLoaders)info— query metadata (field name, selection set, schema)
const resolvers = {
Query: {
user: async (_, { id }, { db }) => db.users.findById(id),
users: async (_, { limit, offset }, { db }) =>
db.users.findAll({ limit, offset }),
},
User: {
// Parent resolver returned a user object; now resolve its posts field
posts: async (user, { limit }, { db }) =>
db.posts.findByUserId(user.id, limit),
},
Mutation: {
createUser: async (_, { input }, { db }) => db.users.create(input),
},
};
The N+1 Problem¶
The most common GraphQL performance trap. Without optimization, resolving a list of N users and their posts triggers 1 + N queries:
Query: users(limit: 20) → SELECT * FROM users LIMIT 20 (1 query)
User[0].posts → SELECT * FROM posts WHERE user_id = 1 (1 query)
User[1].posts → SELECT * FROM posts WHERE user_id = 2 (1 query)
...
User[19].posts → SELECT * FROM posts WHERE user_id = 20 (1 query)
TOTAL: 21 queries
Real-world impact compounds with nesting — posts fetching authors fetching their posts can generate hundreds of queries for a single GraphQL request.
DataLoader — The Solution¶
Facebook's DataLoader batches and caches loads within a single request using Node.js's event loop tick:
import DataLoader from 'dataloader';
// Created once per request (NOT per application startup)
const postsByUserLoader = new DataLoader(async (userIds: readonly string[]) => {
// Single batch query: SELECT * FROM posts WHERE user_id IN (1, 2, ..., 20)
const posts = await db.posts.findByUserIds(userIds);
// Return results in same order as input keys
return userIds.map(id => posts.filter(p => p.userId === id));
});
// In resolver — these 20 calls become ONE SQL query
const resolvers = {
User: {
posts: (user, _, { loaders }) =>
loaders.postsByUser.load(user.id), // batched automatically
},
};
Result: 21 queries → 2 queries (one for users, one batch for all posts).
DataLoader Instance Per Request
Create a new DataLoader instance for each request. DataLoader caches results for the duration of a request — sharing across requests will serve stale data.
Directives¶
Directives annotate schema elements or control query execution:
type User {
email: String! @deprecated(reason: "Use contactEmail instead")
contactEmail: String!
password: String! @auth(requires: ADMIN) # custom directive
}
# Built-in execution directives:
query GetUser($showEmail: Boolean!) {
user(id: 42) {
name
email @include(if: $showEmail) # conditionally include field
phone @skip(if: $skipPhone) # conditionally skip field
}
}
Introspection¶
GraphQL APIs are self-documenting — clients can query the schema itself:
Introspection powers tools like GraphiQL, Apollo Studio, and GraphQL Playground. Disable introspection in production for security-sensitive APIs.
Query Complexity and Depth Limiting¶
Without limits, a malicious client can craft exponentially expensive queries:
# Denial-of-service via deeply nested query:
{ user { friends { friends { friends { friends { ... } } } } } }
Protect with:
- Depth limiting: reject queries deeper than N levels (graphql-depth-limit)
- Complexity analysis: assign costs to fields; reject queries over a budget (graphql-validation-complexity)
- Query whitelisting (persisted queries): only allow pre-approved queries in production
Federation¶
GraphQL Federation lets multiple teams own separate subgraphs that compose into a unified supergraph — one schema, one endpoint, distributed implementation.
┌─────────────────────────────────────────────┐
│ Apollo Router (Supergraph) │
│ Single endpoint: POST /graphql │
└────────┬──────────────┬──────────────────────┘
│ │
┌─────▼─────┐ ┌─────▼──────┐ ┌──────────┐
│ Users │ │ Products │ │ Orders │
│ Subgraph │ │ Subgraph │ │ Subgraph │
│ (Team A) │ │ (Team B) │ │ (Team C) │
└───────────┘ └────────────┘ └──────────┘
Key concepts:
- Entities: types that can be extended across subgraphs, identified by a @key directive
- __resolveReference: resolver that hydrates an entity from a key passed by the router
- @external: field defined in another subgraph, referenced here
- Each subgraph is independently deployable; the router composes them at query time
Federation vs Schema Stitching¶
Before Federation, schema stitching was the primary approach to composing multiple GraphQL services. They solve the same problem differently:
| Dimension | Schema Stitching | Federation |
|---|---|---|
| Composition | Gateway merges schemas at runtime | Router composes via a supergraph schema |
| Type ownership | Gateway defines cross-service types | Each subgraph owns its types via @key |
| Coupling | Gateway knows about all subgraphs' internal types | Subgraphs are self-contained; router only knows entities |
| Deployment | Change in one subgraph may require gateway redeploy | Subgraphs deploy independently |
| Conflict resolution | Manual: gateway resolves field name conflicts | Automatic: @override, @provides, @shareable directives |
| Tooling | GraphQL Tools (@graphql-tools/stitch) |
Apollo Router, Apollo Studio, Cosmo Router |
| Status | Still works; no longer recommended for new projects | Industry standard for multi-team GraphQL |
When stitching still makes sense: small teams, legacy services being gradually migrated, or when you need to compose third-party GraphQL APIs you don't control (federation requires subgraphs to add @key directives).
Error Handling¶
GraphQL errors behave fundamentally differently from REST:
Partial responses — in REST, an error means the entire response fails. In GraphQL, individual fields can fail while the rest of the response succeeds:
{
"data": {
"user": {
"name": "Alice",
"email": "[email protected]",
"creditScore": null
}
},
"errors": [
{
"message": "Unauthorized to access creditScore",
"locations": [{ "line": 5, "column": 5 }],
"path": ["user", "creditScore"],
"extensions": {
"code": "UNAUTHORIZED",
"classification": "AUTHORIZATION"
}
}
]
}
The data field contains whatever succeeded; errors contains what failed. The client must handle both.
Error extensions — the extensions field is the standard way to add machine-readable error metadata:
// Apollo Server — throw typed error with extensions
import { GraphQLError } from 'graphql';
throw new GraphQLError('Order not found', {
extensions: {
code: 'NOT_FOUND',
http: { status: 404 },
orderId: input.id,
traceId: ctx.traceId,
},
});
Error masking — in production, mask internal errors to prevent leaking implementation details:
// Apollo Server 4 — format error for production
const server = new ApolloServer({
typeDefs,
resolvers,
formatError: (formattedError, error) => {
// Log full error internally
logger.error(error);
// Return sanitized error to client
if (formattedError.extensions?.code === 'INTERNAL_SERVER_ERROR') {
return { message: 'Internal server error', extensions: { code: 'INTERNAL_SERVER_ERROR' } };
}
return formattedError;
},
});
Error classification patterns:
| Code | Meaning | HTTP Equivalent |
|---|---|---|
BAD_USER_INPUT |
Invalid query variables | 400 |
UNAUTHENTICATED |
Missing or invalid auth | 401 |
FORBIDDEN |
Authenticated but not authorized | 403 |
NOT_FOUND |
Resource doesn't exist | 404 |
GRAPHQL_VALIDATION_FAILED |
Query doesn't match schema | 400 |
PERSISTED_QUERY_NOT_FOUND |
Unknown query hash (APQ miss) | 400 |
INTERNAL_SERVER_ERROR |
Unhandled server error | 500 |
Caching¶
GraphQL caching is fundamentally harder than REST caching because requests use POST with dynamic query bodies — HTTP caches can't distinguish between different queries to the same /graphql endpoint.
HTTP-level caching (limited):
- GET requests for queries: GET /graphql?query={user(id:42){name}}&variables={} — cacheable by CDN, but URL length limits apply
- Automatic Persisted Queries (APQ) solve this: GET /graphql?extensions={"persistedQuery":{"sha256Hash":"abc..."}}&variables={"id":"42"} — short, cacheable, CDN-friendly
Client-side normalized caching (Apollo Client):
Apollo Client maintains an in-memory normalized cache keyed by __typename:id:
Cache store:
User:42 → { __typename: "User", id: "42", name: "Alice", email: "[email protected]" }
Post:7 → { __typename: "Post", id: "7", title: "Hello", author: { __ref: "User:42" } }
Post:8 → { __typename: "Post", id: "8", title: "World", author: { __ref: "User:42" } }
When a mutation updates User:42, every query displaying that user re-renders automatically — no manual cache invalidation. This is the primary DX advantage of GraphQL over REST for complex frontends.
Cache policies:
| Policy | Behavior | Use Case |
|---|---|---|
cache-first |
Read from cache; network only on miss | Default; best for mostly-static data |
network-only |
Always fetch; update cache | Dashboards, real-time displays |
cache-and-network |
Return cache immediately, then update with network | Instant UI + fresh data |
no-cache |
Fetch without reading or updating cache | One-off queries, sensitive data |
Server-side caching:
- Response-level: cache full GraphQL responses keyed by query hash + variables (Redis)
- Resolver-level: cache individual resolver results (DataLoader already provides per-request caching; add Redis for cross-request caching)
- @cacheControl directive (Apollo): per-field cache hints
type Product @cacheControl(maxAge: 3600) {
id: ID!
name: String!
price: Float! @cacheControl(maxAge: 60) # price changes more often
reviews: [Review!]! @cacheControl(maxAge: 300)
}
gRPC¶
gRPC (Google Remote Procedure Call) is a high-performance, open-source RPC framework that uses Protocol Buffers as its interface definition language and serialization format, and HTTP/2 as the transport protocol. A CNCF project since 2016.
Protocol Buffers (Protobuf)¶
Protobuf is a language-neutral, platform-neutral binary serialization format. Compared to JSON:
| Property | JSON | Protobuf |
|---|---|---|
| Format | Text (UTF-8) | Binary |
| Size | ~1x baseline | 3–10x smaller |
| Parse speed | ~1x baseline | 5–10x faster |
| Schema | Optional (JSON Schema) | Required (.proto file) |
| Human-readable | ✅ | ❌ (need tools) |
| Schema evolution | Manual / fragile | Built-in field numbering |
A .proto service definition:
syntax = "proto3";
package com.example.users;
// Message types
message User {
string id = 1;
string name = 2;
string email = 3;
int64 created_at = 4;
}
message GetUserRequest { string user_id = 1; }
message CreateUserRequest {
string name = 1;
string email = 2;
}
message UserList { repeated User users = 1; }
// Service definition
service UserService {
// Unary
rpc GetUser(GetUserRequest) returns (User);
// Server streaming
rpc ListUsers(ListUsersRequest) returns (stream User);
// Client streaming
rpc CreateUsersBulk(stream CreateUserRequest) returns (UserList);
// Bidirectional streaming
rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}
The protoc compiler generates strongly-typed client stubs and server interfaces in Go, Java, Python, C++, Node.js, Rust, Kotlin, Swift, and more.
HTTP/2 Features Exploited by gRPC¶
| HTTP/2 Feature | What It Enables |
|---|---|
| Multiplexing | Multiple RPC calls on one TCP connection; no head-of-line blocking between requests |
| Binary framing | Headers and data sent as binary frames — more efficient than HTTP/1.1 text headers |
| Header compression (HPACK) | Repeated headers (auth token, content-type) sent as index references after first use; 85–90% header reduction |
| Full-duplex streams | Client and server can send frames simultaneously on the same stream |
| Flow control | Prevents fast producers from overwhelming slow consumers per-stream |
| Server push | Server can pre-emptively send resources (rarely used in gRPC) |
The Four Streaming Types¶
Unary RPC¶
Classic request-response. Client sends one message, server sends one message. Equivalent to a RESTGET.
Server Streaming RPC¶
Client sends one request; server streams multiple responses. Useful for: live logs, large dataset export, real-time feeds.Client Streaming RPC¶
Client streams multiple messages; server collects them and returns one response. Useful for: telemetry ingestion, file uploads chunked by the client, batch writes.Bidirectional Streaming RPC¶
Both sides can send and receive messages in any order over a long-lived connection. Both streams operate independently. Useful for: chat, collaborative editing, real-time games, audio/video signaling.Deadlines and Cancellation¶
Every gRPC call should set a deadline — the absolute time by which the client requires a response. The server checks whether the deadline has been exceeded before starting expensive work.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "42"})
Deadlines propagate through the entire call chain — if service A calls service B calls service C, all three respect the same deadline window, preventing one slow downstream call from causing timeouts at every layer.
Interceptors¶
Interceptors wrap gRPC method invocations — the gRPC equivalent of middleware:
// Unary server interceptor for logging
func loggingInterceptor(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (interface{}, error) {
start := time.Now()
resp, err := handler(ctx, req)
log.Printf("Method: %s | Duration: %v | Error: %v",
info.FullMethod, time.Since(start), err)
return resp, err
}
// Register:
s := grpc.NewServer(
grpc.UnaryInterceptor(loggingInterceptor),
grpc.StreamInterceptor(streamLoggingInterceptor),
)
Common interceptors: authentication, tracing (OpenTelemetry), logging, metrics, panic recovery, rate limiting, deadline enforcement.
Load Balancing¶
Because gRPC multiplexes many RPCs over a single TCP connection, L4 (TCP) load balancing distributes connections, not RPCs. A single long-lived connection from service A to a single pod of service B bypasses all other pods.
Solutions: - L7 (application-layer) load balancing — proxy understands HTTP/2 streams and distributes individual RPCs: Envoy, nginx, gRPC-aware load balancers - Client-side load balancing — the gRPC client resolves all backend IPs (via DNS), maintains connections to each, and distributes RPCs itself - Headless services in Kubernetes — returns all pod IPs; combined with gRPC client-side round-robin
gRPC-Web (Browser Bridge)¶
Browsers cannot make native HTTP/2 gRPC calls (no access to HTTP/2 frames or trailers). gRPC-Web bridges this gap with a protocol translation proxy.
flowchart LR
B[Browser\ngRPC-Web Client] -->|HTTP/1.1 or HTTP/2\nContent-Type: application/grpc-web| P[Envoy Proxy\ngRPC-Web Filter]
P -->|Native HTTP/2 gRPC| S[gRPC Server]
How it works:
1. Browser client uses @grpc/grpc-web or connect-web to make gRPC calls
2. Calls are encoded as application/grpc-web (base64 or binary) over standard HTTP
3. Envoy proxy (or Connect protocol server) translates to native gRPC
4. Server sees standard gRPC requests — no code changes needed
// Browser client using Connect (modern alternative to grpc-web)
import { createClient } from "@connectrpc/connect";
import { createGrpcWebTransport } from "@connectrpc/connect-web";
import { UserService } from "./gen/users_connect";
const transport = createGrpcWebTransport({
baseUrl: "https://api.example.com",
});
const client = createClient(UserService, transport);
const user = await client.getUser({ userId: "42" });
gRPC-Web limitations: - Only unary and server-streaming RPCs (no client-streaming or bidirectional) - Requires a proxy (Envoy, Connect, nginx) unless using Connect protocol natively - Slightly higher latency due to protocol translation
Connect protocol (from Buf) is the modern alternative: supports gRPC, gRPC-Web, and a new Connect protocol natively — all three over a single HTTP endpoint, with browser support without a proxy for the Connect wire format.
SOAP / XML-RPC¶
SOAP (Simple Object Access Protocol) is the predecessor to REST. Still deeply embedded in enterprise systems, financial services, healthcare (HL7), and government integrations.
Protocol Structure¶
A SOAP message is an XML document with a mandatory Envelope, optional Header, and mandatory Body:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:usr="http://example.com/users">
<soap:Header>
<usr:AuthToken>abc123</usr:AuthToken>
</soap:Header>
<soap:Body>
<usr:GetUser>
<usr:UserId>42</usr:UserId>
</usr:GetUser>
</soap:Body>
</soap:Envelope>
WSDL (Web Services Description Language)¶
WSDL is SOAP's IDL — an XML document that describes the service completely: operations, input/output message types, bindings (how operations map to protocols), and endpoints. It serves the same role as OpenAPI for REST or .proto files for gRPC.
<wsdl:definitions name="UserService" ...>
<wsdl:types>
<xs:schema>
<xs:element name="GetUserRequest">
<xs:complexType>
<xs:sequence>
<xs:element name="UserId" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
</wsdl:types>
<wsdl:message name="GetUserInput">
<wsdl:part name="parameters" element="tns:GetUserRequest"/>
</wsdl:message>
<wsdl:portType name="UserServicePortType">
<wsdl:operation name="GetUser">
<wsdl:input message="tns:GetUserInput"/>
<wsdl:output message="tns:GetUserOutput"/>
</wsdl:operation>
</wsdl:portType>
</wsdl:definitions>
SOAP vs REST¶
| Dimension | SOAP | REST |
|---|---|---|
| Payload | XML (verbose) | JSON (compact) |
| Contract | WSDL (machine-readable) | OpenAPI (optional) |
| Transport | HTTP, SMTP, TCP | HTTP only |
| State | Stateful or stateless | Stateless |
| Security | WS-Security (powerful but complex) | OAuth 2.0, JWT, mTLS |
| Error handling | soap:Fault (standardized) |
HTTP status codes (convention-based) |
| Tooling | Mature but heavy | Light and universal |
| Still used for | Banking, insurance, health (HL7), government | Virtually everything new |
XML-RPC predates SOAP — a simpler, less extensible ancestor using XML payloads over HTTP POST. Effectively obsolete.
WebSocket¶
WebSocket provides a persistent, full-duplex TCP connection between client and server, established via an HTTP upgrade handshake. Once established, either side can send messages at any time with minimal overhead.
Handshake¶
# Client initiates upgrade:
GET /ws HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
# Server confirms upgrade:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
After the handshake, the connection is no longer HTTP. Data flows as frames — the minimal overhead unit:
| Frame Type | Description |
|---|---|
| Text frame | UTF-8 text message |
| Binary frame | Raw bytes (audio, video, protobuf) |
| Ping frame | Heartbeat probe (server → client) |
| Pong frame | Heartbeat response |
| Close frame | Graceful connection termination |
Connection Management¶
The primary operational challenge of WebSocket is connection state management:
- Heartbeats (ping/pong): detect dead connections that appear open at the TCP layer. Servers should send pings every 30–60s; if no pong arrives, close and clean up.
- Reconnection: clients should implement exponential backoff when the connection drops. Libraries like
reconnecting-websockethandle this automatically. - Backpressure: if a slow client can't consume fast enough, the server's send buffer fills. Monitor
ws.bufferedAmounton the client, or implement application-level flow control. - Horizontal scaling: WebSocket connections are stateful and sticky. A message sent by user A (connected to server 1) destined for user B (connected to server 2) must be routed between servers via a pub/sub layer (Redis Pub/Sub, Kafka).
When to Use WebSocket¶
- Interactive real-time features: chat, collaborative document editing, multiplayer gaming
- Financial data: live order books, tick-by-tick price feeds
- IoT: bidirectional device control with low latency
- When the client sends frequent data to the server (>1 msg/second)
Server-Sent Events (SSE)¶
SSE is a W3C standard for server-to-client streaming over plain HTTP. Unlike WebSocket, there is no protocol upgrade — it's just a long-lived HTTP response with Content-Type: text/event-stream.
Protocol¶
Server response:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
id: 1
event: message
data: {"type": "notification", "text": "Hello!"}
id: 2
event: update
data: {"user": "alice", "status": "online"}
: heartbeat comment (ignored by client)
SSE message fields:
| Field | Purpose |
|---|---|
| data: | The message payload (can span multiple lines) |
| event: | Custom event type (client listens via addEventListener) |
| id: | Message ID; sent as Last-Event-ID header on reconnect |
| : (comment) | Ignored by client; used for keepalive pings |
Auto-Reconnection¶
SSE's killer feature: if the connection drops, the browser automatically reconnects and sends the Last-Event-ID header — the server can resume from where it left off. No client code required.
const source = new EventSource('/events');
source.addEventListener('message', e => console.log(e.data));
source.addEventListener('update', e => handleUpdate(JSON.parse(e.data)));
source.onerror = e => console.error('SSE error', e);
// Reconnection happens automatically — no manual retry logic needed
HTTP/2 SSE¶
Under HTTP/1.1, browsers limit each domain to 6 connections. With 7 tabs open, SSE connections compete with XHR/fetch requests. Under HTTP/2, all SSE streams multiplex over a single TCP connection — this limit disappears entirely.
AI Streaming
SSE is the standard for LLM token streaming. OpenAI, Anthropic, and virtually all LLM APIs stream completions via SSE because data flows in one direction (server → client), SSE is simpler than WebSocket, and auto-reconnect handles transient failures gracefully.
Webhooks¶
Webhooks are HTTP POST callbacks — the server pushes events to client-registered URLs instead of the client polling for changes. "Don't call us, we'll call you."
Flow¶
sequenceDiagram
participant Client
participant YourServer
participant WebhookConsumer
Client->>YourServer: Register webhook URL
Note over YourServer: Event occurs (payment, commit, signup)
YourServer->>WebhookConsumer: POST /webhook {"event": "payment.succeeded", ...}
WebhookConsumer-->>YourServer: 200 OK (within 5s)
Note over WebhookConsumer: Queue event for async processing
Production Webhook Pattern¶
Respond immediately, process asynchronously:
@app.post("/webhook")
async def webhook_handler(request: Request):
payload = await request.json()
# 1. Validate signature FIRST
verify_signature(request.headers, payload)
# 2. Return 200 immediately — before any processing
background_tasks.add_task(process_event, payload)
return {"status": "accepted"}
Never do slow work (DB queries, API calls) in the webhook handler. Return 200 within 5 seconds or the sender will retry.
Security: Signature Verification¶
Every webhook provider should sign payloads. Verify before processing:
import hmac, hashlib
def verify_signature(headers: dict, body: bytes, secret: str) -> bool:
expected = hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
received = headers.get("X-Signature-256", "").removeprefix("sha256=")
return hmac.compare_digest(expected, received)
Reliability Patterns¶
| Pattern | Purpose |
|---|---|
| Idempotency key | Deduplicate retried deliveries — store processed event IDs |
| Exponential backoff retries | Sender retries on non-2xx: immediately, 5s, 30s, 5m, 30m, 2h |
| Dead letter queue | After N retries, move to DLQ for manual inspection |
| Event replay | Allow consumers to re-request past events by ID |
| CloudEvents format | Standard envelope: id, source, type, time, data |
tRPC¶
tRPC lets TypeScript full-stack teams build APIs where type safety flows automatically from server to client — no code generation, no schema files, no out-of-sync types.
How It Works¶
- Define procedures on the server (TypeScript functions)
- Export the router's type
- Import and use that type on the client
- TypeScript infers input/output types automatically
The client never imports server implementation code — only the type. At runtime, tRPC serializes calls over HTTP (queries → GET/POST, mutations → POST, subscriptions → WebSocket).
Routers and Procedures¶
// server/routers/users.ts
import { z } from 'zod';
import { router, publicProcedure, protectedProcedure } from '../trpc';
export const userRouter = router({
// Query — GET /trpc/users.getById
getById: publicProcedure
.input(z.object({ id: z.string() }))
.query(async ({ input, ctx }) => {
return ctx.db.user.findUnique({ where: { id: input.id } });
}),
// Mutation — POST /trpc/users.create
create: protectedProcedure
.input(z.object({ name: z.string(), email: z.string().email() }))
.mutation(async ({ input, ctx }) => {
return ctx.db.user.create({ data: input });
}),
});
// server/routers/_app.ts
export const appRouter = router({
users: userRouter,
posts: postRouter,
comments: commentRouter,
});
export type AppRouter = typeof appRouter; // ← this is all the client needs
Client Usage¶
// client/trpc.ts
import { createTRPCReact } from '@trpc/react-query';
import type { AppRouter } from '../server/routers/_app';
export const trpc = createTRPCReact<AppRouter>();
// In a React component:
function UserProfile({ userId }: { userId: string }) {
// Fully typed: input, output, error — all inferred from server code
const { data, isLoading } = trpc.users.getById.useQuery({ id: userId });
// data is typed as: User | null | undefined
// Change server return type → TypeScript error here immediately
}
Context and Middleware¶
// Context: per-request shared state (auth user, DB, etc.)
export const createContext = async ({ req, res }: CreateNextContextOptions) => ({
db: prisma,
session: await getSession({ req }),
});
// Middleware: wraps procedures with reusable logic
const isAuthenticated = middleware(({ ctx, next }) => {
if (!ctx.session?.user) throw new TRPCError({ code: 'UNAUTHORIZED' });
return next({ ctx: { ...ctx, user: ctx.session.user } });
});
// Protected procedure: any procedure using this is automatically auth-gated
const protectedProcedure = publicProcedure.use(isAuthenticated);
tRPC vs Alternatives¶
| Dimension | tRPC | REST + OpenAPI | GraphQL |
|---|---|---|---|
| Type safety | ✅ Automatic, zero-gen | ⚠️ Code generation required | ⚠️ Code generation required |
| Language support | TypeScript/JS only | Universal | Universal |
| Schema file | ❌ None (types are the schema) | OpenAPI YAML/JSON | .graphql SDL |
| Learning curve | Low (just TypeScript) | Low | High |
| Client flexibility | ❌ Must use tRPC client | ✅ Any HTTP client | ✅ Any GraphQL client |
| Over/under-fetching | Field selection not built-in | Full response always | ✅ Client specifies fields |
| Best for | TypeScript monorepos (T3 stack, Next.js) | Public APIs, polyglot | Complex multi-client frontends |
Choosing the Right API Paradigm¶
Is this a public API consumed by external developers or third parties?
→ REST (universal, familiar, broad tooling)
Is the frontend complex with multiple clients fetching different data shapes?
→ GraphQL (eliminates over/under-fetching, empowers frontend teams)
Is this internal service-to-service communication with high throughput?
→ gRPC (fastest, binary, streaming support, code-gen clients)
Does the data need to flow in real time in both directions?
→ WebSocket (full-duplex, persistent)
Does the server push updates to passive clients (feeds, notifications)?
→ SSE (simpler than WebSocket, HTTP-native, auto-reconnect)
Is the entire stack TypeScript and owned by one team?
→ tRPC (zero boilerplate, type-safe end-to-end)
Does an external system need to notify you when events occur?
→ Webhooks (event-driven push, polling eliminated)
Is this a legacy enterprise or regulated domain (banking, healthcare)?
→ SOAP (accept the complexity; interoperability with existing systems)
It Is Not Either-Or
Real systems commonly use multiple paradigms together: a public REST API for external consumers, gRPC internally between microservices, GraphQL for the customer-facing frontend, WebSocket for real-time features, and webhooks for third-party integrations.
HTTP/2 and HTTP/3 (QUIC)¶
All modern API protocols ride on top of HTTP — understanding transport evolution is essential.
HTTP/2 (2015, RFC 7540)¶
HTTP/2 is the minimum transport for gRPC and significantly improves REST/GraphQL performance.
| Feature | HTTP/1.1 | HTTP/2 |
|---|---|---|
| Framing | Text-based | Binary frames |
| Multiplexing | ❌ (one request per TCP connection) | ✅ Multiple streams per connection |
| Header compression | ❌ | ✅ HPACK |
| Server push | ❌ | ✅ (rarely used in practice) |
| Connection limit | 6 per origin (browser) | 1 TCP connection, unlimited streams |
| Head-of-line blocking | ✅ At HTTP layer | ❌ At HTTP layer — but YES at TCP layer |
The TCP head-of-line blocking problem: if a single TCP packet is lost, ALL HTTP/2 streams on that connection stall until retransmission completes. This is the fundamental limitation HTTP/3 solves.
HTTP/3 (2022, RFC 9114)¶
HTTP/3 replaces TCP with QUIC (UDP-based transport with built-in TLS 1.3).
graph TB
subgraph "HTTP/2 Stack"
H2[HTTP/2] --> TLS2[TLS 1.2/1.3]
TLS2 --> TCP[TCP]
TCP --> IP1[IP]
end
subgraph "HTTP/3 Stack"
H3[HTTP/3] --> QUIC[QUIC\nbuilt-in TLS 1.3]
QUIC --> UDP[UDP]
UDP --> IP2[IP]
end
Key improvements:
| Feature | HTTP/2 (TCP) | HTTP/3 (QUIC) |
|---|---|---|
| Head-of-line blocking | ✅ TCP-level HOL | ❌ Independent streams per QUIC stream |
| Connection setup | TCP handshake + TLS handshake (2–3 RTT) | 0-RTT or 1-RTT (TLS built into QUIC) |
| Connection migration | ❌ New connection on network change | ✅ Connection ID survives IP change |
| Packet loss recovery | Entire connection stalls | Only affected stream pauses |
| Congestion control | Kernel TCP (cubic/bbr) | User-space (pluggable, per-connection) |
Connection migration is particularly impactful for mobile APIs: when a phone switches from WiFi to cellular, HTTP/2 drops the TCP connection and must re-handshake. HTTP/3's connection ID persists across network changes — the connection continues seamlessly.
0-RTT resumption: returning clients can send data in the very first packet by reusing a previously negotiated TLS session. Crucial for latency-sensitive API calls on mobile networks.
0-RTT Replay Risk
0-RTT data can be replayed by a network attacker. Only use 0-RTT for idempotent operations (GET). Non-idempotent operations (POST) should wait for the full handshake.
gRPC and HTTP/3: gRPC currently requires HTTP/2. Experimental gRPC-over-QUIC implementations exist (e.g., quic-go), but the gRPC specification does not officially support HTTP/3 yet. When it does, the independent-stream property of QUIC will eliminate the head-of-line blocking that currently affects multiplexed gRPC connections.
Content Negotiation¶
Content negotiation lets client and server agree on response format:
# Client requests JSON, can accept XML as fallback
GET /v2/orders/42 HTTP/1.1
Accept: application/json, application/xml;q=0.9, */*;q=0.1
Accept-Language: en-US, fr;q=0.5
Accept-Encoding: gzip, br
# Server responds with chosen representation
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Language: en-US
Content-Encoding: br
Vary: Accept, Accept-Language, Accept-Encoding
The Vary header tells caches which request headers affect the response — critical for correct caching behavior.
API versioning via content negotiation:
This is the most RESTful versioning approach (no URL pollution) but less discoverable than URI versioning.
Architectural Patterns¶
Backend for Frontend (BFF)¶
The BFF pattern creates a dedicated API gateway per client type — each frontend gets an API layer optimized for its specific data needs.
flowchart LR
subgraph Clients
M[Mobile App]
W[Web App]
TV[Smart TV]
end
subgraph BFF Layer
MB[Mobile BFF\nGo / Node.js]
WB[Web BFF\nNode.js]
TB[TV BFF\nNode.js]
end
subgraph Backend Services
US[User Service]
PS[Product Service]
OS[Order Service]
end
M --> MB
W --> WB
TV --> TB
MB --> US & PS & OS
WB --> US & PS & OS
TB --> US & PS
Why BFF over a single gateway: - Mobile needs minimal payloads; web needs rich data — one API can't optimize for both - Each BFF aggregates multiple backend calls into one client-optimized response - Teams can deploy BFFs independently; breaking a mobile BFF doesn't affect web - Authentication/session management can differ per client type
BFF vs GraphQL: GraphQL solves the over/under-fetching problem with client-specified queries, potentially eliminating the need for separate BFFs. However, BFF is still valuable when: - Clients need significantly different business logic (not just different fields) - The team wants to contain complexity behind a simple REST API per client - Backend services expose gRPC — the BFF translates to REST/JSON for browser clients
GraphQL Persisted Queries¶
Persisted queries replace arbitrary client-sent GraphQL strings with pre-registered query IDs — improving security, performance, and bandwidth.
# Without persisted queries — client sends full query string
POST /graphql
{"query": "query GetUser($id: ID!) { user(id: $id) { name email posts { title } } }", "variables": {"id": "42"}}
# With persisted queries — client sends only the hash
POST /graphql
{"extensions": {"persistedQuery": {"version": 1, "sha256Hash": "ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4062f09b5bad56a6585"}}, "variables": {"id": "42"}}
Automatic persisted queries (APQ) flow (Apollo):
1. Client sends query hash only
2. If server doesn't recognize the hash → returns PersistedQueryNotFound
3. Client retries with full query string + hash
4. Server stores the mapping; subsequent requests use hash only
Benefits:
- Security: in locked-down mode, server rejects any query not in the allowlist — prevents arbitrary query attacks
- Bandwidth: hash (64 chars) replaces potentially multi-KB query strings
- CDN caching: hash-based GET requests are cacheable at edge (GET /graphql?extensions={...}&variables={...})
gRPC Health Checking Protocol¶
gRPC defines a standardized health checking protocol (grpc.health.v1) for load balancers and orchestrators:
syntax = "proto3";
package grpc.health.v1;
service Health {
rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}
message HealthCheckRequest {
string service = 1; // empty string = overall server health
}
message HealthCheckResponse {
enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
SERVICE_UNKNOWN = 3;
}
ServingStatus status = 1;
}
# Check health with grpcurl
grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check
# Check specific service
grpcurl -plaintext -d '{"service": "orders.OrderService"}' \
localhost:50051 grpc.health.v1.Health/Check
# Kubernetes gRPC health probe (k8s 1.24+)
# In pod spec:
# livenessProbe:
# grpc:
# port: 50051
# service: ""
gRPC Server Reflection¶
Server reflection allows tools like grpcurl to discover services without .proto files — the gRPC equivalent of OpenAPI's /swagger.json:
// Enable reflection in Go gRPC server
import "google.golang.org/grpc/reflection"
s := grpc.NewServer()
pb.RegisterOrderServiceServer(s, &server{})
reflection.Register(s) // enables runtime schema discovery
# Discover all services (requires reflection)
grpcurl -plaintext localhost:50051 list
# Describe a specific service
grpcurl -plaintext localhost:50051 describe orders.OrderService
# Describe a message type
grpcurl -plaintext localhost:50051 describe orders.Order
Disable Reflection in Production
Like GraphQL introspection, gRPC reflection exposes your entire API surface. Disable it in production or restrict to authorized callers only.
API Performance Patterns¶
Request Compression¶
# Client sends compressed body
POST /v2/orders HTTP/1.1
Content-Encoding: gzip
Content-Type: application/json
# Client requests compressed response
GET /v2/orders HTTP/1.1
Accept-Encoding: gzip, br
Brotli (br) achieves 15–25% better compression than gzip for JSON/text payloads, but requires more CPU for compression. Most CDNs pre-compress static assets with Brotli. For dynamic API responses, gzip is usually the better trade-off (faster compression, slightly larger output).
Connection Pooling¶
HTTP/1.1 clients should maintain a connection pool to avoid the overhead of TCP+TLS handshakes per request:
| Setting | Typical Value | Notes |
|---|---|---|
| Pool size (per host) | 20–100 | Match to expected concurrency |
| Idle timeout | 30–90s | Close idle connections to free resources |
| Max lifetime | 5–10 min | Prevent sticky connections to a single backend |
| Health check interval | 10s | Detect dead connections proactively |
HTTP/2 clients typically use a single connection per host with unlimited streams — connection pooling is less critical but still relevant for fault tolerance (maintain 2–3 connections).
ETag-Based Conditional Requests¶
First request:
GET /v2/orders/42 → 200 OK, ETag: "abc123"
Subsequent request:
GET /v2/orders/42
If-None-Match: "abc123"
→ 304 Not Modified (no body, use cached copy)
→ 200 OK + new ETag (resource changed, here is new version)
ETags reduce bandwidth and server load. For mutable resources, use strong ETags (exact byte-for-byte match). For semantic equivalence, use weak ETags (W/"abc123").
Async Request Collapsing (Request Deduplication)¶
When multiple clients request the same resource simultaneously, collapse them into a single backend request:
Time T=0: Client A → GET /products/42
Time T=1ms: Client B → GET /products/42 (same key, collapse)
Time T=2ms: Client C → GET /products/42 (same key, collapse)
Time T=50ms: Backend returns → fan out to A, B, C
Result: 1 backend call instead of 3
Implemented in: Nginx (proxy_cache_lock), Varnish (grace mode), CloudFlare, Envoy.
Benchmarks: Protocol Performance¶
Approximate comparisons under controlled conditions. Real-world performance depends heavily on payload, network, and implementation.
| Metric | REST (JSON/HTTP2) | GraphQL (JSON/HTTP2) | gRPC (Protobuf/HTTP2) |
|---|---|---|---|
| Serialization size (1KB logical payload) | ~1.2 KB | ~1.0 KB (no over-fetching) | ~0.4 KB |
| Serialization time | ~1x baseline | ~1x | ~0.1–0.3x (binary) |
| Latency (unary, same DC) | ~1–5ms | ~2–8ms (resolver overhead) | ~0.5–2ms |
| Throughput (single connection) | Limited by HTTP/1.1 HOL | Same as REST | Higher (multiplexed, binary) |
| Browser support | ✅ Native | ✅ Native | ⚠️ grpc-web proxy required |
| Streaming | ❌ (SSE for server-push) | ✅ Subscriptions (WS) | ✅ 4 streaming types |
When Performance Matters Less
For most CRUD APIs, the difference between REST and gRPC latency is negligible compared to database query time. Choose the paradigm based on developer experience and client requirements, not raw protocol speed — unless you're building a low-latency trading system or processing millions of internal RPCs per second.