Messaging¶
Event streaming, message queues, and pub/sub systems for asynchronous, decoupled, and real-time architectures.
Domain Scope¶
This domain covers brokered messaging systems used to decouple producers and consumers, durable event streams that retain history for replay and analytics, and lightweight pub/sub layers used inside distributed systems. Topics span the spectrum from log-structured event stores (Kafka, Pulsar, Redpanda) to traditional message queues (RabbitMQ) and ultra-low-overhead pub/sub fabrics (NATS).
Topics¶
| Topic | Style | Highlight |
|---|---|---|
| Apache Kafka | Distributed log | The de facto standard event streaming platform |
| NATS | Pub/sub + JetStream | Ultra-light multi-tenant messaging with edge-first design |
| RabbitMQ | AMQP broker | Mature feature-rich queue broker with quorum queues and streams |
| Redpanda | Kafka-compatible | C++ Kafka-API broker with no JVM and no ZooKeeper |
| Apache Pulsar | Tiered streaming | Segregated compute/storage with built-in geo-replication |
Comparisons¶
- Streaming Brokers Comparison — Kafka vs Redpanda vs Pulsar
- Messaging Patterns Comparison — Log-stream vs queue vs pub/sub trade-offs
When to Use Which¶
| Need | Reach For |
|---|---|
| Replayable event log, large ecosystem | Kafka |
| Kafka API + simpler ops, low-latency | Redpanda |
| Geo-replication built-in, multi-tenant SaaS | Pulsar |
| Lightweight pub/sub, edge & IoT | NATS (Core) |
| Persistent streams without Kafka overhead | NATS JetStream |
| Traditional work-queue, complex routing | RabbitMQ |
| Per-message TTL, priority, dead-lettering | RabbitMQ |
Decision Framework¶
When choosing a messaging system, evaluate along these axes:
- Delivery guarantee — at-most-once, at-least-once, or exactly-once? Kafka and Pulsar offer transactional exactly-once; NATS JetStream provides at-least-once with dedup; RabbitMQ provides at-least-once with publisher confirms.
- Retention model — do consumers need to replay historical data? Log-based systems (Kafka, Redpanda, Pulsar) retain by time/size; queue-based systems (RabbitMQ, NATS Core) discard after acknowledgment.
- Operational complexity — Redpanda and NATS are single-binary deploys; Kafka requires KRaft or ZooKeeper; Pulsar requires BookKeeper + ZooKeeper.
- Latency profile — NATS and Redpanda target sub-millisecond p99; Kafka and Pulsar optimize for throughput at slightly higher tail latency.
- Multi-tenancy — Pulsar has native multi-tenancy with namespace isolation; NATS has accounts; Kafka uses ACLs and quotas.
- Ecosystem maturity — Kafka has the largest connector ecosystem (Kafka Connect, 200+ connectors); Pulsar and Redpanda offer Kafka protocol compatibility for partial reuse.
Cross-Cutting Concerns¶
- Exactly-once semantics: Kafka transactions, Pulsar transactions, NATS JetStream sequence numbers, RabbitMQ publisher confirms + idempotent consumers
- Schema management: Confluent Schema Registry, Apicurio, Karapace, Pulsar built-in schema registry
- Observability: OpenTelemetry messaging spans, broker-side metrics (JMX/Prometheus), end-to-end tracing
- Storage tiers: Tiered storage to S3 (Kafka KIP-405, Pulsar BookKeeper offload), JetStream R3 replication, RabbitMQ stream segments
- Client libraries: All five systems have Go, Java, Python, and Rust clients; Kafka and NATS have the broadest polyglot coverage
- Kubernetes operators: Strimzi (Kafka), Redpanda Operator, StreamNative (Pulsar), RabbitMQ Cluster Operator, NACK (NATS)
Sources¶
- Confluent — Apache Kafka documentation
- NATS.io documentation
- RabbitMQ documentation
- Redpanda documentation
- Apache Pulsar documentation
Kubernetes Operators¶
| System | Operator | CRD-Driven | Auto-Scaling |
|---|---|---|---|
| Kafka | Strimzi | Yes | Via KEDA or custom HPA |
| Redpanda | Redpanda Operator | Yes | Built-in decommission |
| Pulsar | StreamNative / Pulsar Operator | Yes | Via HPA on broker pods |
| NATS | NACK (NATS Account Controller) | Yes | Manual replica count |
| RabbitMQ | RabbitMQ Cluster Operator | Yes | Via HPA on queue depth |
Open Questions¶
- How does Redpanda's read/write tail latency compare to Kafka with KRaft mode at >1M msgs/sec sustained?
- For multi-region active-active, is Pulsar's geo-replication operationally simpler than Kafka MirrorMaker 2.0 + Cluster Linking?
- What is the practical ceiling for NATS JetStream streams per cluster, and how does it compare to Kafka topic count limits?
- What are the cost implications of Pulsar's tiered storage vs Kafka's KIP-405 tiered storage at petabyte scale?
- How do RabbitMQ Streams compare to NATS JetStream for ordered, replayable workloads under 100K msgs/sec?
- How do RabbitMQ Streams compare to Kafka for event sourcing workloads in terms of throughput and consumer group semantics?
- What is the total cost of ownership (TCO) comparison between self-hosted Kafka/Strimzi and Redpanda on equivalent hardware?
- How does Pulsar's Oxia metadata store (replacing ZooKeeper) affect operational complexity and failure modes?