Skip to content

Messaging

Event streaming, message queues, and pub/sub systems for asynchronous, decoupled, and real-time architectures.

Domain Scope

This domain covers brokered messaging systems used to decouple producers and consumers, durable event streams that retain history for replay and analytics, and lightweight pub/sub layers used inside distributed systems. Topics span the spectrum from log-structured event stores (Kafka, Pulsar, Redpanda) to traditional message queues (RabbitMQ) and ultra-low-overhead pub/sub fabrics (NATS).

Topics

Topic Style Highlight
Apache Kafka Distributed log The de facto standard event streaming platform
NATS Pub/sub + JetStream Ultra-light multi-tenant messaging with edge-first design
RabbitMQ AMQP broker Mature feature-rich queue broker with quorum queues and streams
Redpanda Kafka-compatible C++ Kafka-API broker with no JVM and no ZooKeeper
Apache Pulsar Tiered streaming Segregated compute/storage with built-in geo-replication

Comparisons

When to Use Which

Need Reach For
Replayable event log, large ecosystem Kafka
Kafka API + simpler ops, low-latency Redpanda
Geo-replication built-in, multi-tenant SaaS Pulsar
Lightweight pub/sub, edge & IoT NATS (Core)
Persistent streams without Kafka overhead NATS JetStream
Traditional work-queue, complex routing RabbitMQ
Per-message TTL, priority, dead-lettering RabbitMQ

Decision Framework

When choosing a messaging system, evaluate along these axes:

  1. Delivery guarantee — at-most-once, at-least-once, or exactly-once? Kafka and Pulsar offer transactional exactly-once; NATS JetStream provides at-least-once with dedup; RabbitMQ provides at-least-once with publisher confirms.
  2. Retention model — do consumers need to replay historical data? Log-based systems (Kafka, Redpanda, Pulsar) retain by time/size; queue-based systems (RabbitMQ, NATS Core) discard after acknowledgment.
  3. Operational complexity — Redpanda and NATS are single-binary deploys; Kafka requires KRaft or ZooKeeper; Pulsar requires BookKeeper + ZooKeeper.
  4. Latency profile — NATS and Redpanda target sub-millisecond p99; Kafka and Pulsar optimize for throughput at slightly higher tail latency.
  5. Multi-tenancy — Pulsar has native multi-tenancy with namespace isolation; NATS has accounts; Kafka uses ACLs and quotas.
  6. Ecosystem maturity — Kafka has the largest connector ecosystem (Kafka Connect, 200+ connectors); Pulsar and Redpanda offer Kafka protocol compatibility for partial reuse.

Cross-Cutting Concerns

  • Exactly-once semantics: Kafka transactions, Pulsar transactions, NATS JetStream sequence numbers, RabbitMQ publisher confirms + idempotent consumers
  • Schema management: Confluent Schema Registry, Apicurio, Karapace, Pulsar built-in schema registry
  • Observability: OpenTelemetry messaging spans, broker-side metrics (JMX/Prometheus), end-to-end tracing
  • Storage tiers: Tiered storage to S3 (Kafka KIP-405, Pulsar BookKeeper offload), JetStream R3 replication, RabbitMQ stream segments
  • Client libraries: All five systems have Go, Java, Python, and Rust clients; Kafka and NATS have the broadest polyglot coverage
  • Kubernetes operators: Strimzi (Kafka), Redpanda Operator, StreamNative (Pulsar), RabbitMQ Cluster Operator, NACK (NATS)

Sources

Kubernetes Operators

System Operator CRD-Driven Auto-Scaling
Kafka Strimzi Yes Via KEDA or custom HPA
Redpanda Redpanda Operator Yes Built-in decommission
Pulsar StreamNative / Pulsar Operator Yes Via HPA on broker pods
NATS NACK (NATS Account Controller) Yes Manual replica count
RabbitMQ RabbitMQ Cluster Operator Yes Via HPA on queue depth

Open Questions

  • How does Redpanda's read/write tail latency compare to Kafka with KRaft mode at >1M msgs/sec sustained?
  • For multi-region active-active, is Pulsar's geo-replication operationally simpler than Kafka MirrorMaker 2.0 + Cluster Linking?
  • What is the practical ceiling for NATS JetStream streams per cluster, and how does it compare to Kafka topic count limits?
  • What are the cost implications of Pulsar's tiered storage vs Kafka's KIP-405 tiered storage at petabyte scale?
  • How do RabbitMQ Streams compare to NATS JetStream for ordered, replayable workloads under 100K msgs/sec?
  • How do RabbitMQ Streams compare to Kafka for event sourcing workloads in terms of throughput and consumer group semantics?
  • What is the total cost of ownership (TCO) comparison between self-hosted Kafka/Strimzi and Redpanda on equivalent hardware?
  • How does Pulsar's Oxia metadata store (replacing ZooKeeper) affect operational complexity and failure modes?