Skip to content

Architecture

In-depth view of NATS internals: the wire protocol, cluster routes, leaf nodes, gateways, and the JetStream subsystem (including the meta-Raft, asset-Raft, and storage layers).

Component Map

flowchart LR
    subgraph Server["nats-server (single binary)"]
        direction TB
        ClientConn["Client connections\n(TCP 4222 / TLS / WS / MQTT)"]
        SubsRouter["Subject router\n(subject tree)"]
        AccountIso["Account isolation\n(per-account interest)"]
        ClusterRoute["Cluster routes\n(:6222)"]
        Gateway["Gateway endpoint\n(:7222)"]
        LeafEndpoint["Leaf node endpoint\n(:7422)"]
        MQTTBridge["MQTT bridge\n(:1883)"]
        WSBridge["WebSocket bridge\n(:80/:443)"]
        subgraph JetStreamSubsystem["JetStream subsystem (optional)"]
            MetaRaft["Meta-group Raft\n($JS.API)"]
            StreamRaft["Per-stream Raft\n(R1/R3/R5)"]
            ConsumerRaft["Per-consumer Raft\n(durable consumers)"]
            FileStore["File store\n(blocks + idx)"]
            MemStore["Memory store"]
            ObjectStore["Object Store API"]
            KVStore["KV bucket API"]
        end
        Monitoring["HTTP monitoring\n(:8222)"]
    end
    ClientConn --> SubsRouter
    SubsRouter --> AccountIso
    AccountIso --> JetStreamSubsystem
    AccountIso --> ClusterRoute
    AccountIso --> Gateway
    AccountIso --> LeafEndpoint
    MQTTBridge --> SubsRouter
    WSBridge --> SubsRouter
    StreamRaft --> FileStore
    StreamRaft --> MemStore
    KVStore --> StreamRaft
    ObjectStore --> StreamRaft

Components

Component Role
nats-server Single static Go binary that hosts every role: client gateway, route peer, leaf node endpoint, JetStream node, MQTT bridge, WebSocket bridge, monitoring server.
Subject router Per-account trie that maps subscribers/queue groups to subjects with wildcard support (* token, > rest).
Account Hard isolation boundary. Subjects, JetStream streams, KV buckets, and consumers all live inside an account. Cross-account flow only via explicit imports/exports.
System account ($SYS) Special account used for cluster gossip, JetStream cluster state events, and administrative APIs.
Cluster route Direct full-mesh TCP between servers in one cluster on port 6222 — propagates subscription interest.
Gateway Cluster-to-cluster link forming a supercluster; gossip is summarized — gateways carry interest queries, not the full subject graph.
Leaf node Outbound connection from a remote/edge nats-server (or nats-server -leaf) into a hub cluster. Leaf links are account-scoped.
JetStream meta group A cluster-wide Raft group ($JS.API.*) that holds stream/consumer assignment metadata. Lives only on JetStream-enabled servers.
Stream Raft group One Raft group per replicated stream (R3, R5). Uses hashicorp/raft semantics.
Consumer Raft group Independent Raft group per durable consumer for tracking acks and delivery state.
File store On-disk blocks per stream, indexed for fast subject + time range queries. Mmap'd.
MQTT bridge Implements MQTT 3.1.1 over JetStream — sessions and retained messages persist via JetStream KV.

Subject hierarchy & wildcards

NATS subjects are dot-delimited tokens (orders.created.us-east.123). Two wildcards apply only to subscribers:

  • * — single token (orders.*.us-east.123)
  • > — rest of the subject (orders.>)

The interest graph is account-local, so orders.> in account A never reaches account B.

flowchart LR
    Pub["Publisher\norders.created.us-east.123"]
    SubAll["sub orders.>"]
    SubRegion["sub orders.*.us-east.>"]
    Q1["queue grp 'workers'\norders.created.>"]
    Q2["queue grp 'workers'\norders.created.>"]
    Pub --> SubAll
    Pub --> SubRegion
    Pub -.->|load-balanced| Q1
    Pub -.->|load-balanced| Q2

Request-Reply

NATS request-reply uses an inbox subject. The client subscribes to a unique reply subject, sets the reply field on the request, and waits.

sequenceDiagram
    participant C as Client
    participant S as nats-server
    participant Svc as Service
    C->>C: subscribe _INBOX.abc.123
    C->>S: PUB svc.req reply=_INBOX.abc.123
    S->>Svc: MSG svc.req reply=_INBOX.abc.123
    Svc->>S: PUB _INBOX.abc.123 (response)
    S->>C: MSG _INBOX.abc.123

Sub-millisecond round trips are typical inside a single cluster.

Cluster routing

Cluster peers form a full mesh of routes. Each server exchanges interest updates so that every peer knows which other peers have subscribers for any given subject. This makes routing decisions O(1) on the publish path.

A subscription update is a RS+ / RS- operation on the route protocol. Optimization: routes carry per-account interest summaries, not per-subject details for high-fanout sets.

Supercluster (gateways)

Gateways are not full meshes of subscriptions; they exchange interest queries lazily. When a publish on cluster A targets a subject with no local interest, the cluster asks adjacent gateways "do you have interest in X?" Gateways cache responses with a TTL.

This design keeps inter-cluster traffic proportional to actual cross-cluster demand rather than subscription counts. Trade-off: first-message latency across the gateway includes a query round trip.

Leaf nodes

Leaf nodes connect outbound to a hub. They:

  1. Carry their own accounts (or map a local subset into a hub account).
  2. Show the hub only the subjects their account exposes.
  3. Buffer JetStream traffic when the link is unhealthy if domain= is set.

Common patterns:

  • Edge factory — leaf node on a Raspberry Pi 5, devices speak MQTT into the leaf, leaf publishes via NATS to the hub.
  • Per-tenant leaf — SaaS customers run a leaf on-prem; their data never leaves their network until summary subjects are exported.

JetStream

JetStream replaces the historical NATS Streaming server with a fully integrated persistence layer.

Storage model

Field Detail
Block file Append-only, sized via max_msgs_per_block or default 256 MB.
Index Per-block index of subject + sequence + timestamp; rebuilt on crash from blocks.
Compaction By age (max_age), size (max_bytes), count (max_msgs), or per-subject (max_msgs_per_subject).
Memory store Same APIs as file store but volatile; useful for ephemeral fan-out.

Retention policies

Policy Behavior
Limits (default) Discard oldest when limits hit.
Interest Keep messages while at least one consumer hasn't acked.
WorkQueue Each message must be consumed exactly once across consumers.

Consumer types

Consumer Style Push Pull Notes
Durable Server tracks ack state; survives client restarts.
Ephemeral Server forgets state once inactive.
Queue group Push consumers can join a deliver group for load balance.
Pull Recommended for scale: client batches Fetch(n).
Ordered Single-active replay, used when strict order matters.

Ack policies: none, all, explicit. Replay policies: instant, original.

Stream replication

sequenceDiagram
    participant P as Publisher
    participant L as Leader (n1)
    participant F1 as Follower (n2)
    participant F2 as Follower (n3)
    P->>L: PUB orders.created
    L->>L: append to log, fsync
    L->>F1: AppendEntries
    L->>F2: AppendEntries
    F1->>L: ack
    F2->>L: ack
    Note right of L: quorum reached (2/3)
    L->>P: PubAck (stream seq)

R3 / R5 mean three- or five-server Raft groups. Quorum is (N/2)+1. Leader election uses standard Raft term + vote semantics.

KV bucket

A KV bucket is a JetStream stream with max_msgs_per_subject=1 and subject prefix $KV.<bucket>.. Put appends; Get reads the last sequence; Watch is a JetStream consumer over the subject set. Compare-and-Swap uses Nats-Expected-Last-Subject-Sequence.

Object Store

A bucket of two streams: - OBJ_<bucket> for chunks (default 128 KB chunk size). - OBJ_<bucket>_meta for object metadata.

Get streams chunks back to the client. Replication mirrors stream R-factor.

Decentralized auth (Operators / Accounts / Users)

flowchart TB
    Op["Operator\n(O-key, signs accounts)"]
    SysAcc["System Account ($SYS)\n(A-key)"]
    AccA["Account A\n(A-key)"]
    AccB["Account B\n(A-key)"]
    UserA["User credentials\n(U-key + JWT)"]
    UserB["User credentials\n(U-key + JWT)"]
    Op --> SysAcc
    Op --> AccA
    Op --> AccB
    AccA --> UserA
    AccB --> UserB
    Resolver["Account JWT resolver\n(memory / NATS / URL)"]
    AccA -. JWT .- Resolver
    AccB -. JWT .- Resolver
    Server["nats-server\n(trusts Operator key)"]
    Resolver --> Server
  • NKey — Ed25519 keypair encoded in custom base32 with role prefix (O/A/U/X).
  • JWT — issued by the Operator (for accounts) or Account (for users); contains permissions, limits, and signing-key references.
  • Resolver — a server discovers account JWTs at runtime via MEMORY (preloaded), the embedded nats resolver, or an external URL.

Performance characteristics

Workload Latency / Throughput
Core NATS pub/sub same host sub-microsecond delivery
Core NATS pub/sub LAN ~50–100 µs RTT
Core NATS publish, single client tens of millions msgs/sec/server (small payloads)
JetStream R1 file store hundreds of thousands msgs/sec/server (NVMe)
JetStream R3 file store ~tens of thousands msgs/sec quorum-limited
Gateway cross-region first request RTT + gateway query (~1 RTT extra)

Source for benchmarks

Numbers above reflect Synadia and community benchmarks (see nats bench results). Always re-measure for your own hardware/NVMe class before sizing a deployment.

Comparison hooks

  • vs Kafka — Kafka wins on broad analytic streaming ecosystem; NATS wins on multi-tenant edge fabrics and operational simplicity.
  • vs RabbitMQ — RabbitMQ wins on routing primitives & dead-letter ergonomics; NATS wins on latency and footprint.
  • vs Pulsar — Pulsar offers segregated compute/storage and tiered storage; NATS uses local store with mirrors/sources for similar effects but at smaller scale.