Apache Kafka¶
The de facto standard distributed event streaming platform — durable replicated logs, exactly-once semantics, KRaft consensus, and a deep ecosystem (Connect, Streams, ksqlDB, Schema Registry).
Overview¶
Apache Kafka is an open-source distributed event streaming platform originally developed at LinkedIn (2011) and donated to the Apache Software Foundation. It stores events in durable, partitioned, replicated commit logs organized into topics, with producers appending records and consumers reading at their own pace via durable per-group offsets. Since Kafka 4.0 (March 2025), the platform runs KRaft-only — Apache ZooKeeper has been fully removed and metadata is now managed by an internal Raft quorum of controllers.
Kafka's value proposition rests on four properties: (1) high throughput per broker thanks to sequential disk I/O, page-cache reliance, and zero-copy sendfile(), (2) durable replication via the in-sync-replica (ISR) protocol with optional Eligible Leader Replicas (ELR, KIP-966), (3) exactly-once semantics (EOS) for read-process-write workloads using idempotent producers + transactions, and (4) a massive ecosystem (Kafka Connect for integration, Kafka Streams for stateful processing, ksqlDB for SQL on streams, Schema Registry for data contracts, MirrorMaker 2 for cross-cluster replication, and tiered storage for cost-efficient long retention).
Key Facts¶
| Attribute | Detail |
|---|---|
| Website | kafka.apache.org |
| GitHub Stars | ~30k+ (apache/kafka) |
| Latest Version | 4.2.0 (released 2026-02-17), 4.1.2 patch line |
| Language | Java + Scala (broker), Java/Go/Python/.NET/Rust clients |
| License | Apache License 2.0 (permissive) |
| Origin / Maintainer | Originated at LinkedIn (2011), Apache Software Foundation TLP since 2012 |
| Primary Vendor | Confluent (founded 2014 by original Kafka authors) |
| Coordination | KRaft (Raft) — ZooKeeper removed in 4.0 |
| Wire Protocol | Binary TCP, schema versioned per ApiKey |
| Storage Format | Append-only segmented log, magic v2 record batches |
Evaluation¶
Pros¶
| Pro | Detail |
|---|---|
| Massive ecosystem | Connect, Streams, ksqlDB, Schema Registry, hundreds of connectors |
| Durability + replication | ISR protocol, configurable min.insync.replicas, ELR (KIP-966) |
| Exactly-once semantics | Idempotent producer + transactions across topics/partitions |
| High throughput | LinkedIn measured 2M writes/sec on 3 commodity machines (2014) |
| Replayable history | Consumers re-read from any offset; durable for compliance and reprocessing |
| Tiered storage (KIP-405) | GA in 3.6, offload cold segments to S3/GCS/HDFS — cuts storage cost dramatically |
| Truly open | Apache 2.0, no BSL/SSPL relicensing risk like Redis or Elastic |
| Multi-language clients | Official Java + many high-quality community clients (librdkafka, sarama, confluent-kafka-python) |
Cons¶
| Con | Detail |
|---|---|
| Operational complexity | Topic design, partition planning, ISR shrinkage, broker bounce procedures all require expertise |
| JVM-bound brokers | Page cache and GC tuning matter; competing brokers (Redpanda) avoid the JVM entirely |
| Partition count limits | Per-broker partition count is a real ceiling (10k–200k depending on hardware/version) |
| Consumer rebalance pain | Pre-KIP-848 (4.0) rebalances stop the world; eager rebalancing still common in older clients |
| Cross-region replication | MirrorMaker 2 is solid but not transparent — offset translation, lag, and cutover require care |
| Schema management not built-in | Confluent Schema Registry, Apicurio, or Karapace must be deployed separately |
| Cold-start latency | New consumers in a large group can wait seconds to minutes before processing |
Architecture (Summary)¶
flowchart LR
subgraph Producers["Producers"]
P1["KafkaProducer<br/>(idempotent)"]
P2["KafkaProducer<br/>(transactional)"]
end
subgraph KafkaCluster["Kafka Cluster (KRaft)"]
direction TB
subgraph ControllerQuorum["Controller Quorum (Raft)"]
KC1["KafkaController 1"]
KC2["KafkaController 2 (active)"]
KC3["KafkaController 3"]
end
subgraph Brokers["Broker Pool"]
KS1["KafkaServer 1<br/>LogManager / ReplicaManager"]
KS2["KafkaServer 2<br/>LogManager / ReplicaManager"]
KS3["KafkaServer 3<br/>LogManager / ReplicaManager"]
end
ControllerQuorum -- "metadata log<br/>__cluster_metadata" --> Brokers
end
subgraph RemoteStorage["Tiered Storage (KIP-405)"]
S3["RemoteStorageManager<br/>(S3 / GCS / HDFS)"]
end
subgraph Consumers["Consumer Groups"]
CG1["ConsumerGroup A"]
CG2["ConsumerGroup B<br/>(transactional read_committed)"]
end
P1 -- "Produce v9" --> KS1
P2 -- "Produce v9 (txn)" --> KS2
KS1 -- "Replicate (Fetch)" --> KS2
KS2 -- "Replicate (Fetch)" --> KS3
KS1 -- "Cold segment upload" --> S3
CG1 -- "Fetch v15" --> KS2
CG2 -- "Fetch v15" --> KS3
style ControllerQuorum fill:#1f3a5f,color:#fff
style KS2 fill:#0d6e0d,color:#fff
Detailed architecture, KRaft consensus internals, replication protocol, log format, and benchmarks live in messaging/kafka/architecture.
Use Cases¶
| Use Case | Why Kafka Fits |
|---|---|
| Event sourcing | Durable, replayable, ordered per partition |
| Microservices messaging backbone | Decouples producers/consumers, supports fan-out and back-pressure |
| Real-time stream processing | Kafka Streams, Flink, Spark Structured Streaming, ksqlDB integrations |
| CDC (change data capture) | Debezium connectors stream Postgres/MySQL/Mongo bin-log events into topics |
| Log aggregation | Replaces Scribe/Flume with replicated storage and replay |
| Metrics & telemetry transport | OpenTelemetry exporters, Prometheus remote-write to Kafka |
| Data lake ingestion | Connect S3 sink, Iceberg sink, Delta Lake bridge |
| Audit trail / immutable event log | Compacted topics + retention policies |
| Activity stream / clickstream | Original LinkedIn use case; high-cardinality, partitioned by user/session |
Licensing & Pricing¶
- Apache Kafka: Apache License 2.0 — fully free, no usage restrictions, no telemetry call-home, vendor-neutral.
- Confluent Platform: Free Community License (CL) for some components (Schema Registry, REST Proxy historically); Confluent Enterprise (commercial) adds RBAC, audit logging, tiered storage UI, Cluster Linking, Control Center.
- Confluent Cloud: Fully managed SaaS, priced per ingress GB / egress GB / partition-hour / cluster-type (Basic, Standard, Enterprise, Dedicated). Kora engine (Confluent's cloud-native rewrite) underlies Dedicated tier.
- Managed alternatives: Amazon MSK (per-broker-hour + storage), MSK Serverless (per-partition-hour + GB), Aiven for Kafka, Instaclustr Managed Kafka, Azure Event Hubs Kafka API endpoint, Upstash Kafka.
- Self-hosted on K8s: Strimzi (CNCF Sandbox, free) and the Bitnami Helm chart (note: Bitnami removed the public catalog free tier in July 2025).
License clarity
Unlike HashiCorp Vault (BSL 1.1), Redis (RSALv2/SSPLv1), Elastic (Elastic License v2), or MongoDB (SSPL), Apache Kafka has not been relicensed and remains pure Apache 2.0. Confluent's enterprise add-ons are separately licensed but the core broker is unaffected.
Ecosystem¶
| Component | Purpose |
|---|---|
| Kafka Connect | Source/sink integration framework — JDBC, S3, Elastic, Mongo, Snowflake, BigQuery, Iceberg |
| Kafka Streams | Embedded JVM library for stateful stream processing (KStream, KTable, joins, windowing) |
| ksqlDB | SQL-on-streams engine (Confluent Community License) |
| Schema Registry | Avro/JSON-Schema/Protobuf schema storage with compatibility checks (Confluent, Apicurio, Karapace) |
| MirrorMaker 2 | Connect-based cross-cluster replication (KIP-382, supersedes MM1) |
| Cruise Control | LinkedIn's automated rebalancing and self-healing controller |
| Strimzi | CNCF Sandbox K8s operator with full Kafka CRDs |
| Debezium | CDC connectors built on Kafka Connect |
| librdkafka | C/C++ client library (used by Python, Go, .NET wrappers) |
| kcat (kafkacat) | Swiss-army CLI for produce/consume/metadata |
| Conduktor / Kafka UI / AKHQ / Redpanda Console | Web UIs for cluster admin and topic browsing |
| Streams Replication Manager | Cloudera commercial replication |
Compatibility & Requirements¶
- JDK: Java 17 LTS or Java 21 LTS (Java 11 dropped in 4.0).
- Operating System: Linux strongly preferred for production (
sendfile, page cache tuning,epoll); macOS for dev only; Windows broker not supported in production. - Hardware: SSD strongly recommended for the log directory; 10GbE+ networking; >=32 GiB RAM per broker for sizable workloads (page cache).
- Filesystem: XFS recommended over ext4; avoid network-attached storage for active log dirs (tiered storage is the right answer for cold data).
- Container support: Official
apache/kafka:4.2.0image (KRaft-native); also widely deployed via Strimzi / Confluent images. - Wire protocol compatibility: Newer brokers accept older client API versions; clients should match or be newer than the broker for best feature support. Mixed-version cluster upgrades are supported via the
inter.broker.protocol.versionsetting (now phased out in pure-KRaft 4.x).
Latest Versions¶
| Version | Release | Highlights |
|---|---|---|
| 4.2.0 | 2026-02-17 | Share Groups (Queues, KIP-932) GA; Streams Rebalance Protocol GA; DLQ in exception handlers |
| 4.1.x | 2025-09 → 2026 | Share Groups preview; Streams Rebalance (KIP-1071) early access; KIP-877 plugin metrics |
| 4.0.0 | 2025-03-18 | KRaft-only (ZooKeeper removed); Consumer Rebalance Protocol (KIP-848) GA; Eligible Leader Replicas (KIP-966 part 1); Java 11 dropped |
| 3.9.x | 2024–2025 | Last ZK-supporting line; CVE-2025-27817/27818/27819 backports landed in 3.9.1 |
| 3.6.0 | 2023-10 | Tiered Storage (KIP-405) GA |
| 3.3.0 | 2022-10 | KRaft mode declared production-ready |
Alternatives¶
| Alternative | Style | When to Prefer |
|---|---|---|
| Redpanda | Kafka API, C++ broker, no JVM, no ZK | Lower tail latency, simpler ops, single-binary deploy |
| Apache Pulsar | Tiered (compute / BookKeeper storage) | Built-in geo-replication, multi-tenant SaaS, native tiered storage |
| NATS / JetStream | Lightweight pub/sub + stream | Edge / IoT, ultra-low overhead, simpler operational model |
| RabbitMQ | AMQP broker | Per-message TTL, complex routing, traditional work-queues, priority |
| AWS Kinesis Data Streams | Managed shard-based stream | All-in on AWS; smaller scale; no Kafka API |
| Google Pub/Sub | Managed at-least-once pub/sub | All-in on GCP; serverless scaling; weaker ordering |
| Azure Event Hubs | Managed, Kafka-protocol-compatible | All-in on Azure; gateway speaks Kafka wire protocol |
| WarpStream | S3-native Kafka-compatible | Object-storage-only architecture; pay only for storage |
| AutoMQ | S3-native Kafka fork | Cloud-native cost optimization, Kafka API |
See Streaming Brokers Comparison for a head-to-head.
Migration & Lock-in¶
- API surface: The Kafka wire protocol is open; Redpanda, WarpStream, AutoMQ, Azure Event Hubs, and others reimplement it. Migrating Kafka clients to a wire-compatible alternative usually requires only a
bootstrap.serverschange. - Cross-cluster migration: MirrorMaker 2 (Connect-based) is the canonical tool — it replicates topic data, consumer offsets (via the Checkpoint connector), heartbeats, and ACLs. Confluent Cluster Linking is a more transparent commercial alternative that preserves offsets exactly.
- Connector ecosystem lock-in: Some Confluent-licensed connectors (e.g. some premium cloud sinks) won't run outside Confluent Platform; standard open-source connectors (Debezium, jdbc, S3) port freely.
- Operational lock-in: Tiered storage configuration is portable but the active
RemoteLogMetadataManagertopic isn't trivially moved — plan for a clean cutover. - Schema lock-in: Confluent Schema Registry's wire format includes a 5-byte magic + schema-ID prefix; Apicurio and Karapace implement the same wire format for compatibility.
Community Health¶
- Apache TLP since October 2012; one of the most active TLPs by commit volume.
- KIP (Kafka Improvement Proposal) process — public design docs voted on by PMC. Recent flagship KIPs include KIP-848 (consumer rebalance), KIP-932 (share groups / queues), KIP-405 (tiered storage), KIP-714 (client telemetry), KIP-966 (ELR), and KIP-1071 (Streams rebalance).
- Quarterly minor releases; ~12 months of community support per minor.
- Confluent employs many committers but the project remains independently governed by the Apache Software Foundation.
- Strong third-party ecosystem: Strimzi (CNCF Sandbox), Debezium (Red Hat), Cruise Control (LinkedIn), Aiven, Instaclustr, AWS MSK, Azure Event Hubs.
Sources¶
- Apache Kafka — official documentation
- Apache Kafka 4.2.0 release announcement
- Apache Kafka 4.0.0 release announcement
- Confluent Platform documentation
- Apache Kafka GitHub
- Confluent — Apache Kafka Performance and Test Results
- LinkedIn — Benchmarking Apache Kafka: 2 Million Writes per Second
- Apache Kafka CVE list
- Strimzi (Kubernetes Operator)
- KIP index (Apache Wiki)
Open Questions¶
- Q: Is Share Groups (KIP-932) ready to replace RabbitMQ for queue-style workloads? — GA in 4.2 (Feb 2026), but message-level acknowledgements and DLQ semantics are new; mature shops should evaluate on staging before betting production traffic.
- Q: When does tiered storage actually pay off vs. just over-provisioning local disk? — Generally once active retention exceeds ~7 days at >100 MB/s sustained write, and especially when egress patterns are mostly tail reads (page-cache hits) with occasional historical replays.
- Q: How do KRaft controller failovers compare operationally to ZK failovers in real production? — Anecdotally faster (sub-second metadata propagation in many cases) and simpler to operate, but few public head-to-head latency studies exist for very large clusters (>200 brokers).
- Q: What is the practical maximum partitions-per-broker on KRaft 4.x? — Pre-KRaft, the rule of thumb was ~4k partitions per broker before metadata propagation became painful. KRaft and ELR raise this substantially; published numbers from Confluent/AWS suggest 200k+ per cluster, but per-broker budgets depend heavily on hardware and replication factor.
- Q: Is exactly-once semantics still expensive enough to avoid by default? — In Kafka 4.x with idempotent producer enabled by default, the marginal cost of EOS is much smaller than in 0.11–2.x; for read-process-write Streams jobs it is usually the right default.