Skip to content

Apache Pulsar

Cloud-native messaging + streaming with segregated compute and storage, multi-tenancy from day one, and built-in geo-replication.

Overview

Apache Pulsar is a distributed pub-sub and streaming platform with a fundamentally different architecture from Kafka: brokers are stateless while storage lives in Apache BookKeeper bookies. This separation lets compute and storage scale independently, and lets the same broker fleet serve hot, cold, and tiered-to-S3 reads transparently.

Pulsar was contributed to Apache by Yahoo (2016) and is now developed by a wide vendor base — StreamNative, DataStax, Tencent, Alibaba, and others.

Key Facts

Attribute Detail
Website pulsar.apache.org
GitHub apache/pulsar
Stars ~14k+ ⭐
Latest Version 4.1.x (CY 2026 stable line)
Language Java
License Apache-2.0
Steward Apache Software Foundation; commercial support from StreamNative, DataStax
Wire Protocols Pulsar protocol (native), Kafka API (KoP plugin), MQTT (MoP), AMQP (AoP)

Evaluation

Pros Cons
Compute/storage separation scales independently More moving parts: brokers + BookKeeper + metadata store
Multi-tenancy is native (tenants → namespaces → topics) Operationally heavier than NATS or Redpanda
Geo-replication is a config flag (per-namespace) Geo-replication ordering caveats need careful reading
Tiered storage to S3 / GCS / Azure Blob Java/JVM tax (heap tuning, GC)
Pulsar Functions = stream processing in-broker Pulsar Functions runtime needs its own ops attention
Schema Registry is built-in KoP/MoP/AoP feature parity lags native
Subscription types: Exclusive / Failover / Shared / Key_Shared Some Kafka tooling expects ZooKeeper / KRaft directly
Strong consistency via BookKeeper write quorums Cursor management can be tricky on partitioned topics

Architecture

flowchart TB
    Producer["Producer / Consumer client"]
    subgraph BrokerLayer["Pulsar Brokers (stateless)"]
        Broker1["PulsarBroker 1"]
        Broker2["PulsarBroker 2"]
        Broker3["PulsarBroker 3"]
    end
    subgraph BookKeeper["Apache BookKeeper (storage)"]
        Bookie1["BookKeeperBookie 1"]
        Bookie2["BookKeeperBookie 2"]
        Bookie3["BookKeeperBookie 3"]
        Bookie4["BookKeeperBookie 4"]
    end
    subgraph Metadata["Metadata layer"]
        ZK["ZooKeeper / etcd / RocksDB"]
        ConfigStore["Configuration Store (global)"]
    end
    subgraph TieredStorage["Tiered storage"]
        S3["S3 / GCS / Azure Blob"]
    end
    Producer --> Broker1
    Producer --> Broker2
    Producer --> Broker3
    Broker1 --> Bookie1
    Broker1 --> Bookie2
    Broker1 --> Bookie3
    Broker2 --> Bookie2
    Broker2 --> Bookie3
    Broker2 --> Bookie4
    Broker3 --> Bookie1
    Broker3 --> Bookie3
    Broker3 --> Bookie4
    Broker1 -.-> ZK
    Broker2 -.-> ZK
    Broker3 -.-> ZK
    ZK -.-> ConfigStore
    Bookie1 -.-> ZK
    Bookie2 -.-> ZK
    Bookie3 -.-> ZK
    Bookie4 -.-> ZK
    Broker1 --> S3
    Broker2 --> S3

See messaging/pulsar/architecture for component-level details.

Use Cases

  • Multi-tenant SaaS messaging — tenants → namespaces → topics with per-tenant resource quotas.
  • Geo-distributed streaming — built-in cross-region replication, no external MirrorMaker-equivalent.
  • Hybrid cloud / hybrid edge — brokers near applications, bookies near storage.
  • Independent compute/storage scaling — high-fanout consumers don't strain storage; large retention doesn't strain brokers.
  • Stream processing inside the cluster — Pulsar Functions for ETL, enrichment, simple transformations.
  • IoT message ingest — combine native Pulsar with MQTT-on-Pulsar (MoP) for device traffic.

Licensing & Pricing

  • Apache Pulsar: Apache-2.0, free for any use.
  • StreamNative Cloud: managed cloud Pulsar (BYOC, Dedicated, Serverless tiers).
  • DataStax Astra Streaming: managed Pulsar (acquired by IBM in 2025).
  • Tencent Cloud TDMQ: managed Pulsar in Tencent Cloud.

Ecosystem

  • Pulsar Functions — lightweight stream-processing runtime in-broker or as separate workers.
  • Pulsar IO connectors — source/sink connectors for Kafka, JDBC, S3, Elasticsearch, MongoDB, etc.
  • Pulsar SQL — Trino-based SQL over topic data.
  • Schema Registry — built-in; supports Avro, JSON, Protobuf, and key-value composites.
  • KoP (Kafka-on-Pulsar) — Pulsar broker speaks Kafka wire protocol.
  • MoP (MQTT-on-Pulsar) — accept MQTT clients on the broker.
  • AoP (AMQP-on-Pulsar) — accept AMQP 0-9-1 clients on the broker.
  • Clients — Java (reference), Go, Python, Node.js, C++, C#, Rust.

Compatibility & Requirements

Requirement Detail
Brokers Java 17+; 4 vCPU + 8 GB heap baseline
Bookies Java 17+; SSD/NVMe for journal + ledger dirs
Metadata store ZooKeeper 3.8+, etcd 3.5+, or RocksDB (standalone)
Configuration store Per-instance global ZooKeeper for multi-cluster
Tiered storage S3, GCS, Azure Blob
Network TCP 6650 (binary), 8080 (HTTP REST), 6651 (TLS), 8443 (HTTPS)
Container Official Apache images at apachepulsar/pulsar

Latest Versions

  • 4.1.x — current 2026 stable: improved load balancer, lazier topic loading, transactional consumer fixes.
  • 4.0.x — added per-broker pluggable load balancer, cursor protobuf optimizations.
  • 3.x — long-term-supported branch; ZooKeeper-only metadata.
  • 2.10.x — older; missing many recent multi-tenant features.

Track at pulsar.apache.org/release-notes.

Alternatives

  • Apache Kafka — single-tier compute+storage; simpler model but coupled scaling.
  • Redpanda — Kafka API in C++; per-partition Raft.
  • NATS — lighter, lower-latency request-reply; account isolation rather than tenants/namespaces.
  • RabbitMQ — AMQP routing primitives; not log-replay-oriented.
  • AWS Kinesis / Azure Event Hubs / GCP Pub/Sub — managed cloud-native equivalents.

Migration & Lock-in

  • Kafka clients work via KoP plugin — but feature parity isn't 1:1.
  • Tiered storage offload format is portable — segments are stored as a known format (Apache MLP).
  • Pulsar Functions are Pulsar-specific; rewriting against Flink or Beam is non-trivial.
  • Schema Registry is wire-compatible with Confluent SR for Avro use cases (read carefully — Pulsar's KV schemas are unique).
  • Subscription type lock-inKey_Shared semantics differ from any other broker; design carefully.

Community Health

  • Active multi-vendor governance (StreamNative, Tencent, Alibaba, DataStax, Yahoo Japan, …).
  • Regular Apache release cadence; LTS lines.
  • Annual Pulsar Summit and active mailing lists.
  • Several thousand stars on apache/pulsar and many production references at scale (Yahoo Japan, Splunk, Tencent, Verizon).

Sources

Open Questions

  • For multi-region active-active, is Pulsar's geo-replication operationally simpler than Kafka MirrorMaker 2.0 + Cluster Linking — and at what RPO?
  • What is the practical storage-cluster (BookKeeper) sizing rule for sustained 1 GB/s ingest with 30-day retention?
  • For Key_Shared subscriptions, what failure modes can violate key-locality (e.g. bookie loss + cursor recovery)?
  • For tiered storage, what is the time-to-first-byte for a consumer reading offloaded data vs hot storage?
  • After IBM's DataStax acquisition, what is the long-term commercial roadmap for Astra Streaming?