Architecture¶
Apache Pulsar's three-tier architecture: stateless brokers, BookKeeper bookies for storage, and ZooKeeper (or alternatives) for metadata.
Component Overview¶
flowchart TB
Client["Producer / Consumer"]
subgraph Brokers["Pulsar Brokers (stateless)"]
Broker["PulsarBroker\n- ManagedLedger\n- ManagedCursor\n- NamespaceBundle owner\n- Subscriptions\n- Functions runtime"]
end
subgraph BK["Apache BookKeeper"]
Bookie["BookKeeperBookie\n- Journal disk\n- Ledger disk\n- Garbage collection"]
AutoRecovery["BK AutoRecovery\n(Auditor + Replication Worker)"]
end
subgraph Metadata["Metadata Layer"]
ZK["ZooKeeper / etcd / RocksDB"]
ConfigStore["Configuration Store\n(per-instance, global)"]
end
subgraph Tiered["Tiered Storage"]
Offloader["TieredStorageOffloader"]
Cloud["S3 / GCS / Azure Blob"]
end
Client --> Broker
Broker --> Bookie
Broker --> ZK
Bookie --> ZK
AutoRecovery --> ZK
AutoRecovery --> Bookie
Broker --> Offloader
Offloader --> Cloud
Broker -.cluster info.- ConfigStore
Layer responsibilities¶
| Layer | Holds | State? |
|---|---|---|
| Broker | Topic ownership (via NamespaceBundles), subscriptions, ManagedLedger / ManagedCursor handles, Pulsar Functions instances | Stateless w.r.t. messages |
| BookKeeper bookies | Persistent message data as ledgers (sequences of entries) | Stateful |
| ZooKeeper / etcd | Cluster topology, ledger metadata, broker–bundle ownership, schemas | Stateful (small) |
| Configuration Store | Tenants, namespaces, policies, geo-replication topology | Stateful (small) |
| Tiered storage | Offloaded ledgers | Stateful (large) |
Tenant / Namespace / Topic Hierarchy¶
my-tenant/
├── ns-prod/
│ ├── persistent://my-tenant/ns-prod/orders (partitioned 12 ways)
│ └── persistent://my-tenant/ns-prod/audit-log
└── ns-staging/
└── non-persistent://my-tenant/ns-staging/debug
- Tenant — security boundary; managed by an admin.
- Namespace — policy boundary (retention, replication, schema, dispatch quotas).
- Topic — actual message stream.
persistent://...(durable) ornon-persistent://...(best-effort).
Topic Storage Model — Ledgers and Segments¶
A topic's data is stored as a chain of ledgers (BookKeeper entities), each ledger split into segments. Brokers' ManagedLedger abstraction manages ledger creation, rollover, and trimming.
flowchart LR
subgraph Topic["persistent://my-tenant/ns/orders"]
L1["Ledger 1\n(closed)"]
L2["Ledger 2\n(closed)"]
L3["Ledger 3\n(open)"]
end
subgraph Bookies["BookKeeper bookies"]
BkA["Bookie A"]
BkB["Bookie B"]
BkC["Bookie C"]
end
L1 --> BkA
L1 --> BkB
L1 --> BkC
L2 --> BkA
L2 --> BkB
L2 --> BkC
L3 --> BkA
L3 --> BkB
L3 --> BkC
OffStore["Tiered storage S3"]
L1 -. offload .-> OffStore
Write quorum / Ack quorum / Ensemble size¶
BookKeeper writes use three numbers (Eq, Wq, Aq):
| Param | Meaning |
|---|---|
| Ensemble size (Eq) | Number of bookies that can hold this ledger. |
| Write quorum (Wq) | Number of bookies the write is striped across. |
| Ack quorum (Aq) | Number of bookies that must ack a write before it's considered durable. |
Common production values: Eq=3, Wq=3, Aq=2. Performance and durability trade-offs flow from these settings.
Subscription Types¶
flowchart LR
Topic["Topic"]
Sub1["Subscription S1\n(Exclusive)"]
Sub2["Subscription S2\n(Failover)"]
Sub3["Subscription S3\n(Shared)"]
Sub4["Subscription S4\n(Key_Shared)"]
Topic --> Sub1
Topic --> Sub2
Topic --> Sub3
Topic --> Sub4
C1["Consumer A"]
C2["Consumer B"]
C3["Consumer C"]
Sub1 --> C1
Sub2 --> C1
Sub2 -. standby .- C2
Sub3 --> C1
Sub3 --> C2
Sub3 --> C3
Sub4 -- "key=k1" --> C1
Sub4 -- "key=k2" --> C2
Sub4 -- "key=k3" --> C3
| Type | Behavior |
|---|---|
| Exclusive | Only one consumer at a time; second consumer is rejected. |
| Failover | One active, others standby. New active picked on failure. |
| Shared (Round-robin) | Messages distributed across all consumers. |
| Key_Shared | Messages with the same key always route to the same consumer (sticky hash, auto-split). |
Geo-Replication¶
flowchart LR
subgraph US["us-east cluster"]
BrokerUS["broker"]
BookieUS["bookie"]
ZkUS["ZK"]
end
subgraph EU["eu-west cluster"]
BrokerEU["broker"]
BookieEU["bookie"]
ZkEU["ZK"]
end
subgraph APAC["ap-southeast cluster"]
BrokerAP["broker"]
BookieAP["bookie"]
ZkAP["ZK"]
end
Global["Configuration Store\n(global ZK)"]
ZkUS -.-> Global
ZkEU -.-> Global
ZkAP -.-> Global
BrokerUS <-->|geo-replicate| BrokerEU
BrokerEU <-->|geo-replicate| BrokerAP
BrokerUS <-->|geo-replicate| BrokerAP
A namespace can be configured with a list of clusters; the brokers run a replicator subscriber that produces published messages into the remote cluster's topic. Each cluster keeps its own copy.
Mechanism summary:
- Per-namespace replication clusters set via
pulsar-admin namespaces set-clusters. - Replicated cursor tracks per-cluster delivery position.
- Conflict resolution: last-write-wins on identical message ids; in practice, design topics to be partitioned per-source-cluster.
Pulsar Functions¶
flowchart LR
Input["Input topic A"]
Func["PulsarFunction\n(user code)"]
Output["Output topic B"]
DLQ["Dead-letter topic"]
Input --> Func
Func --> Output
Func -. on error .-> DLQ
Functions can run inside the broker (lightweight) or as a separate Functions Worker cluster (production). Source/sink IO connectors live in the same runtime.
Schemas & Transactions¶
- Schema Registry is per-namespace; supports Avro, JSON Schema, Protobuf, and key-value composites.
- Transactions (Pulsar transactions) span produces + acks across multiple topics; transaction coordinator runs in a system topic.
Performance Characteristics¶
| Workload | Notes |
|---|---|
| Single broker, R3 ledger, NVMe bookies | ~150–300 MB/s sustained (varies wildly with hardware) |
| Cluster-wide aggregate | Linear with broker + bookie counts |
| Tiered-storage cold read | Adds object-store latency to first byte |
| Geo-replication lag | Typically WAN RTT + replicator commit interval |
| Pulsar Functions overhead | ~1–10 ms/event for moderate transforms |
OpenMessaging Benchmark publishes apples-to-apples Kafka/Pulsar/RabbitMQ comparisons. Trust on-hardware measurement above any vendor blog.
Comparison Hooks¶
- vs Kafka — Pulsar's compute/storage split is its big differentiator; Kafka's single-binary model is simpler ops.
- vs Redpanda — Redpanda is Kafka-API and single-binary; Pulsar is multi-tenant and multi-protocol but heavier.
- vs NATS — both have multi-tenant designs; NATS uses accounts, Pulsar uses tenants/namespaces. Pulsar wins on durable-stream scale, NATS on latency.