Architecture¶
In-depth view of NATS internals: the wire protocol, cluster routes, leaf nodes, gateways, and the JetStream subsystem (including the meta-Raft, asset-Raft, and storage layers).
Component Map¶
flowchart LR
subgraph Server["nats-server (single binary)"]
direction TB
ClientConn["Client connections\n(TCP 4222 / TLS / WS / MQTT)"]
SubsRouter["Subject router\n(subject tree)"]
AccountIso["Account isolation\n(per-account interest)"]
ClusterRoute["Cluster routes\n(:6222)"]
Gateway["Gateway endpoint\n(:7222)"]
LeafEndpoint["Leaf node endpoint\n(:7422)"]
MQTTBridge["MQTT bridge\n(:1883)"]
WSBridge["WebSocket bridge\n(:80/:443)"]
subgraph JetStreamSubsystem["JetStream subsystem (optional)"]
MetaRaft["Meta-group Raft\n($JS.API)"]
StreamRaft["Per-stream Raft\n(R1/R3/R5)"]
ConsumerRaft["Per-consumer Raft\n(durable consumers)"]
FileStore["File store\n(blocks + idx)"]
MemStore["Memory store"]
ObjectStore["Object Store API"]
KVStore["KV bucket API"]
end
Monitoring["HTTP monitoring\n(:8222)"]
end
ClientConn --> SubsRouter
SubsRouter --> AccountIso
AccountIso --> JetStreamSubsystem
AccountIso --> ClusterRoute
AccountIso --> Gateway
AccountIso --> LeafEndpoint
MQTTBridge --> SubsRouter
WSBridge --> SubsRouter
StreamRaft --> FileStore
StreamRaft --> MemStore
KVStore --> StreamRaft
ObjectStore --> StreamRaft
Components¶
| Component | Role |
|---|---|
| nats-server | Single static Go binary that hosts every role: client gateway, route peer, leaf node endpoint, JetStream node, MQTT bridge, WebSocket bridge, monitoring server. |
| Subject router | Per-account trie that maps subscribers/queue groups to subjects with wildcard support (* token, > rest). |
| Account | Hard isolation boundary. Subjects, JetStream streams, KV buckets, and consumers all live inside an account. Cross-account flow only via explicit imports/exports. |
System account ($SYS) |
Special account used for cluster gossip, JetStream cluster state events, and administrative APIs. |
| Cluster route | Direct full-mesh TCP between servers in one cluster on port 6222 — propagates subscription interest. |
| Gateway | Cluster-to-cluster link forming a supercluster; gossip is summarized — gateways carry interest queries, not the full subject graph. |
| Leaf node | Outbound connection from a remote/edge nats-server (or nats-server -leaf) into a hub cluster. Leaf links are account-scoped. |
| JetStream meta group | A cluster-wide Raft group ($JS.API.*) that holds stream/consumer assignment metadata. Lives only on JetStream-enabled servers. |
| Stream Raft group | One Raft group per replicated stream (R3, R5). Uses hashicorp/raft semantics. |
| Consumer Raft group | Independent Raft group per durable consumer for tracking acks and delivery state. |
| File store | On-disk blocks per stream, indexed for fast subject + time range queries. Mmap'd. |
| MQTT bridge | Implements MQTT 3.1.1 over JetStream — sessions and retained messages persist via JetStream KV. |
Subject hierarchy & wildcards¶
NATS subjects are dot-delimited tokens (orders.created.us-east.123). Two wildcards apply only to subscribers:
*— single token (orders.*.us-east.123)>— rest of the subject (orders.>)
The interest graph is account-local, so orders.> in account A never reaches account B.
flowchart LR
Pub["Publisher\norders.created.us-east.123"]
SubAll["sub orders.>"]
SubRegion["sub orders.*.us-east.>"]
Q1["queue grp 'workers'\norders.created.>"]
Q2["queue grp 'workers'\norders.created.>"]
Pub --> SubAll
Pub --> SubRegion
Pub -.->|load-balanced| Q1
Pub -.->|load-balanced| Q2
Request-Reply¶
NATS request-reply uses an inbox subject. The client subscribes to a unique reply subject, sets the reply field on the request, and waits.
sequenceDiagram
participant C as Client
participant S as nats-server
participant Svc as Service
C->>C: subscribe _INBOX.abc.123
C->>S: PUB svc.req reply=_INBOX.abc.123
S->>Svc: MSG svc.req reply=_INBOX.abc.123
Svc->>S: PUB _INBOX.abc.123 (response)
S->>C: MSG _INBOX.abc.123
Sub-millisecond round trips are typical inside a single cluster.
Cluster routing¶
Cluster peers form a full mesh of routes. Each server exchanges interest updates so that every peer knows which other peers have subscribers for any given subject. This makes routing decisions O(1) on the publish path.
A subscription update is a RS+ / RS- operation on the route protocol. Optimization: routes carry per-account interest summaries, not per-subject details for high-fanout sets.
Supercluster (gateways)¶
Gateways are not full meshes of subscriptions; they exchange interest queries lazily. When a publish on cluster A targets a subject with no local interest, the cluster asks adjacent gateways "do you have interest in X?" Gateways cache responses with a TTL.
This design keeps inter-cluster traffic proportional to actual cross-cluster demand rather than subscription counts. Trade-off: first-message latency across the gateway includes a query round trip.
Leaf nodes¶
Leaf nodes connect outbound to a hub. They:
- Carry their own accounts (or map a local subset into a hub account).
- Show the hub only the subjects their account exposes.
- Buffer JetStream traffic when the link is unhealthy if
domain=is set.
Common patterns:
- Edge factory — leaf node on a Raspberry Pi 5, devices speak MQTT into the leaf, leaf publishes via NATS to the hub.
- Per-tenant leaf — SaaS customers run a leaf on-prem; their data never leaves their network until summary subjects are exported.
JetStream¶
JetStream replaces the historical NATS Streaming server with a fully integrated persistence layer.
Storage model¶
| Field | Detail |
|---|---|
| Block file | Append-only, sized via max_msgs_per_block or default 256 MB. |
| Index | Per-block index of subject + sequence + timestamp; rebuilt on crash from blocks. |
| Compaction | By age (max_age), size (max_bytes), count (max_msgs), or per-subject (max_msgs_per_subject). |
| Memory store | Same APIs as file store but volatile; useful for ephemeral fan-out. |
Retention policies¶
| Policy | Behavior |
|---|---|
Limits (default) |
Discard oldest when limits hit. |
Interest |
Keep messages while at least one consumer hasn't acked. |
WorkQueue |
Each message must be consumed exactly once across consumers. |
Consumer types¶
| Consumer Style | Push | Pull | Notes |
|---|---|---|---|
| Durable | ✓ | ✓ | Server tracks ack state; survives client restarts. |
| Ephemeral | ✓ | ✓ | Server forgets state once inactive. |
| Queue group | ✓ | – | Push consumers can join a deliver group for load balance. |
| Pull | – | ✓ | Recommended for scale: client batches Fetch(n). |
| Ordered | ✓ | – | Single-active replay, used when strict order matters. |
Ack policies: none, all, explicit. Replay policies: instant, original.
Stream replication¶
sequenceDiagram
participant P as Publisher
participant L as Leader (n1)
participant F1 as Follower (n2)
participant F2 as Follower (n3)
P->>L: PUB orders.created
L->>L: append to log, fsync
L->>F1: AppendEntries
L->>F2: AppendEntries
F1->>L: ack
F2->>L: ack
Note right of L: quorum reached (2/3)
L->>P: PubAck (stream seq)
R3 / R5 mean three- or five-server Raft groups. Quorum is (N/2)+1. Leader election uses standard Raft term + vote semantics.
KV bucket¶
A KV bucket is a JetStream stream with max_msgs_per_subject=1 and subject prefix $KV.<bucket>.. Put appends; Get reads the last sequence; Watch is a JetStream consumer over the subject set. Compare-and-Swap uses Nats-Expected-Last-Subject-Sequence.
Object Store¶
A bucket of two streams:
- OBJ_<bucket> for chunks (default 128 KB chunk size).
- OBJ_<bucket>_meta for object metadata.
Get streams chunks back to the client. Replication mirrors stream R-factor.
Decentralized auth (Operators / Accounts / Users)¶
flowchart TB
Op["Operator\n(O-key, signs accounts)"]
SysAcc["System Account ($SYS)\n(A-key)"]
AccA["Account A\n(A-key)"]
AccB["Account B\n(A-key)"]
UserA["User credentials\n(U-key + JWT)"]
UserB["User credentials\n(U-key + JWT)"]
Op --> SysAcc
Op --> AccA
Op --> AccB
AccA --> UserA
AccB --> UserB
Resolver["Account JWT resolver\n(memory / NATS / URL)"]
AccA -. JWT .- Resolver
AccB -. JWT .- Resolver
Server["nats-server\n(trusts Operator key)"]
Resolver --> Server
- NKey — Ed25519 keypair encoded in custom base32 with role prefix (
O/A/U/X). - JWT — issued by the Operator (for accounts) or Account (for users); contains permissions, limits, and signing-key references.
- Resolver — a server discovers account JWTs at runtime via
MEMORY(preloaded), the embeddednatsresolver, or an external URL.
Performance characteristics¶
| Workload | Latency / Throughput |
|---|---|
| Core NATS pub/sub same host | sub-microsecond delivery |
| Core NATS pub/sub LAN | ~50–100 µs RTT |
| Core NATS publish, single client | tens of millions msgs/sec/server (small payloads) |
| JetStream R1 file store | hundreds of thousands msgs/sec/server (NVMe) |
| JetStream R3 file store | ~tens of thousands msgs/sec quorum-limited |
| Gateway cross-region first request | RTT + gateway query (~1 RTT extra) |
Source for benchmarks
Numbers above reflect Synadia and community benchmarks (see nats bench results). Always re-measure for your own hardware/NVMe class before sizing a deployment.
Comparison hooks¶
- vs Kafka — Kafka wins on broad analytic streaming ecosystem; NATS wins on multi-tenant edge fabrics and operational simplicity.
- vs RabbitMQ — RabbitMQ wins on routing primitives & dead-letter ergonomics; NATS wins on latency and footprint.
- vs Pulsar — Pulsar offers segregated compute/storage and tiered storage; NATS uses local store with mirrors/sources for similar effects but at smaller scale.