Apache Kafka
A distributed, partitioned, replicated append-only log — the default backbone for high-throughput event streaming, decoupling services, and feeding many consumers from one durable stream.
Also worth naming: Confluent Platform · Amazon MSK · Redpanda (API-compatible) · Azure Event Hubs (Kafka API)
Kafka is not a queue you drain — it is a durable log you replay. Once you internalise "append-only log, partitioned for scale, consumed by offset," most of its behaviour in an interview follows from first principles.
What it is
Kafka is a distributed commit log. Producers append records to topics; each topic is split into partitions, and each partition is an ordered, append-only sequence where every record gets a monotonically increasing offset. Consumers read forward from an offset they control — and crucially, reading does not delete the data. Records stay for a configured retention (hours, days, or forever with compaction), so many independent consumers can read the same stream, and a broken consumer can rewind and replay.
That one design choice — a retained log rather than a drained queue — is what makes Kafka the backbone of event-driven systems. The same orders topic can feed billing, search indexing, analytics, and fraud detection, each as an independent consumer group with its own offset, without the producer knowing any of them exist.
Kafka's other defining trait is throughput. By appending sequentially to disk, batching, and letting consumers pull, a single broker handles hundreds of thousands to millions of messages per second. In an interview, Kafka is the answer when you need a durable, replayable, high-volume event stream — and the wrong answer when you need a simple task queue or low-latency request/response.
When to reach for it
Reach for this when…
- You need a durable, replayable event stream that many independent consumers read
- High write throughput — clickstreams, logs, metrics, IoT, CDC change events
- Decoupling services with an event backbone (publish once, many subscribers)
- Stream processing or event sourcing where ordered, retained history matters
Not really this pattern when…
- You need a simple per-message task queue with easy retries and a DLQ (SQS / RabbitMQ are lighter)
- You need low-latency synchronous request/response (that is an RPC, not a log)
- Per-message TTL, priority ordering, or arbitrary delete — Kafka is a log, not a queue
- Tiny scale where running a Kafka cluster is more operational cost than the problem warrants
How it works
Four ideas explain almost everything Kafka does:
1. The partition is the unit of order and parallelism. Ordering is guaranteed within a partition, never across partitions. The producer chooses a partition by hashing a key (e.g. user_id), so all records for one key land in one partition and stay ordered. This is the single most important design decision: pick the key whose ordering you need to preserve, and accept that throughput is capped by partition count.
Producers append to the end of a partition; each record gets a monotonic offset. Consumers track their own offset and read forward — the log is not deleted on read, so many independent consumers replay the same data.
2. Consumer groups give you scaling and fan-out at once. Within a group, each partition is owned by exactly one consumer, so you scale a workload by adding consumers up to the partition count. Add a second group and it reads the whole topic independently with its own offsets — that is how one stream feeds many systems.
Within one consumer group, each partition is owned by exactly one consumer — so max parallelism equals the partition count. Different groups read the same topic independently with their own offsets, which is how one event stream feeds many systems.
3. Replication is how a committed record survives failure. Each partition has one leader and N followers; the in-sync replicas (ISR) are the followers caught up to the leader. With acks=all and min.insync.replicas=2, a write is only acknowledged once it is on multiple brokers — so a single broker dying loses nothing. Lower the acks for speed and you trade away that guarantee.
Producers write to the partition leader; followers replicate. A write is committed once all in-sync replicas (ISR) have it. acks=all + min.insync.replicas trades latency for the guarantee that a committed record survives a broker failure.
4. Consumers pull and track their own offset. Kafka does not push or track per-message acks like a traditional broker. The consumer commits an offset meaning "I've processed up to here." Commit after the side effect and a crash re-delivers (at-least-once); commit before and a crash drops (at-most-once). This is why Kafka delivery is at-least-once in practice and consumers must be idempotent.
Performance envelope
Kafka performance envelope — the numbers to quote.
| Dimension | Number | Why it matters |
|---|---|---|
| Throughput | ~100K–1M+ msgs/sec per broker | Sequential disk append + batching; scales by adding brokers/partitions |
| Latency | Single-digit to tens of ms (acks=all) | Not a low-latency RPC; fine for streaming, not for a 5ms hot path |
| Retention | Hours to forever (size/time or compaction) | Replay and multi-consumer fan-out depend on it |
| Parallelism | = partition count per consumer group | You cannot have more active consumers than partitions |
| Durability | acks=all + ISR ≥ 2 → survives a broker loss | The knob that trades latency for no data loss |
| Ordering | Per partition only | Global ordering means one partition = no parallelism |
Capabilities in interviews
Event backbone / pub-sub
Publish an event once; many independent consumer groups react without the producer knowing them.
The canonical use. A service publishes domain events to a topic; each downstream consumes as its own group:
# producer: one write
send("orders", key=order_id, value=OrderPlaced{...})
# three independent consumer groups, three offsets, one stream
billing-group → charge the card
search-group → index the order
analytics-group → update the dashboardAdding a new consumer is zero-touch for the producer — you just start a new group. This is what "decouple with an event backbone" means in a design, and why Kafka beats point-to-point calls when the number of downstreams grows.
Choose this variant when
- Many services react to the same domain events
- You expect to add consumers over time
- Decoupling producers from a growing set of subscribers
High-throughput ingestion buffer
Absorb a firehose of writes and let downstream stores consume at their own pace.
Clickstreams, logs, metrics, ad impressions, IoT telemetry — millions of events/sec that no OLTP database can take directly. Kafka sits in front as a durable buffer:
clients → Kafka (buffer + retain) → stream processor → warehouse / OLAP / time-series DBThe buffer decouples produce rate from consume rate (smoothing bursts) and gives you replay if a downstream consumer has a bug. This is the standard shape for the ad-click aggregator and any analytics ingestion pipeline.
Choose this variant when
- Write volume exceeds what the destination store can absorb
- Bursty ingestion that needs smoothing
- Analytics / metrics / log pipelines
CDC & outbox transport
Carry every committed database change as an ordered, replayable event stream.
Pair Kafka with change-data-capture (Debezium) or the outbox pattern: every committed row change becomes a Kafka record, keyed by primary key so changes per row stay ordered. Downstream consumers rebuild search indexes, caches, and warehouses from the stream — and recover from bugs by resetting the offset and replaying.
This is the backbone of keeping derived stores in sync without dual writes, and the reason Kafka's retention + replay matter so much: your read models are rebuildable.
Choose this variant when
- Keeping search/cache/warehouse in sync with a DB
- Event sourcing or a derived-view architecture
- You need to rebuild a downstream by replaying history
Stream processing source/sink
Feed real-time aggregations, joins, and windowed computation with Kafka Streams or Flink.
Kafka is the durable substrate under stream processors. A job reads from a topic, computes windowed aggregates or joins two streams, and writes results back to another topic:
raw-clicks → [Flink: 1-min tumbling window count by ad_id] → click-counts topic → storeKafka's ordered, replayable partitions are what let the processor recover exactly where it left off after a failure. For the heavy lifting (windowing, state, joins) you reach for Flink; Kafka is the transport and the checkpoint log.
Choose this variant when
- Real-time aggregation / windowing / stream joins
- Materialising live views from an event stream
- Pairing with Flink or Kafka Streams
Operating knobs
Partition count
Partitions set both ordering granularity and max consumer parallelism. Too few and you cap throughput; too many and you pay rebalance, metadata, and open-file overhead. Pick the partition key to match the ordering you need (per user, per order), and size the count to peak throughput ÷ per-consumer throughput, with headroom — increasing partitions later re-shuffles keys.
acks & min.insync.replicas
acks=all + min.insync.replicas=2 means a write must be on at least two in-sync brokers before it is acknowledged — no data loss on a single broker failure, at higher latency. acks=1 is faster but loses un-replicated writes if the leader dies. This is the central durability-vs-latency dial.
Retention & compaction
Time/size retention deletes old records (good for buffers and replay windows). Log compaction instead keeps the latest value per key forever (good for CDC / changelog topics where you want the current state of every key). Choose retention to cover your worst-case replay/recovery need.
Delivery semantics
Default is at-least-once: commit the offset after processing, accept duplicates, make consumers idempotent. Kafka transactions add exactly-once within Kafka (read-process-write between topics), but the moment you write to an external DB you are back to at-least-once end-to-end — so idempotency is non-negotiable.
Versus the alternatives
Kafka vs other messaging options.
| Dimension | Kafka | SQS / RabbitMQ | Redis Streams |
|---|---|---|---|
| Model | Retained, replayable partitioned log | Drained queue (delete on ack) | Lightweight log + consumer groups |
| Throughput | 100K–1M+/sec per broker | High (SQS) / moderate (RabbitMQ) | 100K+/sec (single node) |
| Replay | Yes — rewind the offset | No — message gone after ack | Limited (in-memory, bounded) |
| Fan-out | Many independent consumer groups | Needs SNS/exchanges to fan out | Consumer groups, single node |
| Best for | Event streaming, CDC, analytics ingest | Simple task queues with DLQ/retries | Queues without standing up Kafka |
Failure modes & gotchas
If consumers fall behind the produce rate, lag grows until records age out of retention and are silently lost. Monitor consumer lag (e.g. via Burrow) as the leading signal; autoscale consumers up to the partition count, and remember you cannot add parallelism beyond the partition count without repartitioning.
Kafka optimises throughput, not per-message latency. Putting it on a synchronous sub-100ms request path adds batching and poll latency you will regret. Use it for async streaming; keep it off the hot request path.
A bad key (e.g. partition by a value where one bucket dominates) sends most traffic to one partition, capping throughput at one consumer regardless of cluster size. Choose a high-cardinality, evenly-distributed key; add a salt for known-hot keys.
Kafka transactions give exactly-once within Kafka. As soon as a consumer writes to an external DB or calls an API, it is at-least-once again. The honest answer is at-least-once delivery with idempotent consumers — say that, not "exactly-once."
Too few caps parallelism and throughput; too many balloons rebalance time, controller metadata, and open file handles, and raises end-to-end latency. Size deliberately and leave headroom rather than repartitioning a live topic, which reshuffles key→partition assignment.
In production
Where Kafka was born — trillions of messages a day
Kafka was created at LinkedIn to replace a tangle of point-to-point data pipelines with a single unified log, and LinkedIn still runs one of the largest deployments in the world — on the order of 7 trillion messages per day across thousands of brokers and hundreds of clusters. Every meaningful event (profile views, page views, metrics, database changes via CDC) is published once to Kafka and consumed by dozens of downstream systems: search indexing, the news feed, analytics, monitoring, and the data warehouse.
The architectural lesson is exactly the interview framing: instead of N producers each integrating with M consumers (an N×M mess), everything publishes to the log and everything reads from it, so adding a new data consumer is a new consumer group, not a new integration. The partition-per-key model and consumer-group fan-out are what let a single stream feed the entire company.
Uber
Trillions of messages powering real-time pricing and safety
Uber runs one of the largest Kafka deployments outside LinkedIn — trillions of messages and petabytes of data per day — as the backbone of its real-time platform. Driver and rider location pings, trip events, and app interactions flow through Kafka into stream processors that compute surge pricing, match riders to drivers, detect fraud, and feed dashboards, all within seconds.
Uber's engineering write-ups emphasize the same durability and replay properties: acks=all with replication so a broker failure cannot lose a trip event, partitioning by keys like city or trip id to preserve per-entity ordering, and retention that lets them reprocess streams when a pipeline changes. They pair Kafka (transport + retention) with stream processors (computation) — the exact "Kafka moves the stream, a processor computes on it" split.
Good vs bad answer
Interviewer probe
“An ad-click system ingests 500K events/sec and several teams (billing, analytics, fraud) all need the data. How do events flow?”
Weak answer
"I'd write the clicks into a database, and each team can query the table they need. If it's too much, I'll add a queue in front so writes don't overwhelm the DB."
Strong answer
"A Kafka topic clicks, partitioned by ad_id so per-ad ordering holds and load spreads. Producers append at 500K/sec — sequential log writes handle that on a handful of brokers with acks=all and ISR ≥ 2 so a broker loss doesn't drop committed clicks. Then each team is an independent consumer group on the same topic: billing aggregates spend, analytics windows counts into the warehouse, fraud scores in real time — none of them know about each other, and a new consumer is zero-touch for producers. Retention (say 3–7 days) gives every team replay if a consumer has a bug. For the analytics rollups I'd put Flink between Kafka and the store for windowed aggregation. The key thing: Kafka is a retained log, so one ingest stream feeds all consumers and survives downstream failure — a database table in front of a queue gives me neither replay nor clean fan-out."
Why it wins: Names the topic + partition key with the ordering reason, sizes throughput against Kafka's real envelope, sets durability with acks/ISR, uses consumer groups for clean multi-team fan-out, and leans on retention for replay — versus a DB-centric answer that loses fan-out and replay.
Interview playbook
When it comes up
- High write volume — clickstreams, logs, metrics, IoT, ingestion pipelines
- Several consumers need the same event stream
- Decoupling services with an event backbone, CDC, or event sourcing
- The interviewer asks "how do these services communicate at scale?"
Order of reveal
- 11. Frame it as a log, not a queue. I use Kafka as a durable, replayable log — producers append, consumers read by offset, data is retained so many consumers share one stream.
- 22. Name the partition key. I partition by the key whose order I need (ad_id, user_id); ordering holds per partition and that key also spreads load.
- 33. Consumer groups for fan-out. Each downstream is its own consumer group with independent offsets, so one stream feeds billing, search, and analytics without coupling.
- 44. Set durability. acks=all with min.insync.replicas=2 so a committed record survives a broker failure — the latency cost is worth it here.
- 55. Idempotent consumers. Delivery is at-least-once, so consumers dedupe on an event id; I do not claim exactly-once across an external write.
Signature phrases
- “Kafka is a retained log you replay, not a queue you drain.” — The core mental model that drives every correct follow-up.
- “Ordering is per partition; the key is the design decision.” — Shows you understand the one real constraint.
- “One stream, many consumer groups, independent offsets.” — Captures why Kafka beats point-to-point for fan-out.
- “acks=all + ISR ≥ 2, and consumers are idempotent.” — Demonstrates durability and honest delivery semantics.
Likely follow-ups
?“How do you guarantee no message is lost?”Reveal
On the produce side, acks=all with min.insync.replicas=2 so a write is only acknowledged once it is on at least two in-sync brokers — a single broker failure loses nothing, and the producer retries on failure with idempotent-producer enabled to avoid duplicates from retries. On the consume side, commit the offset after the side effect completes, so a crash re-delivers rather than skips. Combined with replication factor 3, that is durable end to end. The cost is latency, which is acceptable for a streaming workload.
?“A consumer is falling behind — lag keeps growing. What do you do?”Reveal
First confirm via consumer-lag monitoring. If the consumer is CPU-bound, scale out by adding consumers in the group — but only up to the partition count, since that is the parallelism ceiling. If you are already at the partition count, you must increase partitions (accepting key re-shuffle) or make the consumer cheaper per message (batching, async I/O). If a downstream dependency is the bottleneck, the queue is just hiding a capacity problem — fix the consumer's throughput, do not let lag silently age data out of retention.
?“When would you pick SQS or RabbitMQ over Kafka?”Reveal
When I need a simple task queue, not an event log: per-message acks, easy retries with a built-in dead-letter queue, no need to retain or replay, and lower operational weight. SQS is fully managed and effectively infinite for fire-and-forget background jobs; RabbitMQ gives rich routing and priorities. Kafka earns its complexity when I need high throughput, retention/replay, ordered partitions, and many independent consumer groups — if I don't need those, it is over-engineering.
Worked example
Setup. The interviewer wants the ingestion backbone for an analytics platform: 500K user-activity events/sec from web and mobile, feeding three independent consumers — a real-time dashboard, a data warehouse, and a fraud detector — with no consumer able to slow down the others or the producers.
The move. One Kafka topic, activity, partitioned by user_id so all events for a user stay ordered and load spreads evenly across partitions. I size partitions to throughput: at 500K/sec and ~20K/sec drained per consumer instance, I need ~25 consumers per group, so I provision ~32 partitions (round number with headroom — and remember partition count is the parallelism ceiling per group). Producers append with acks=all and min.insync.replicas=2 on replication factor 3, so a broker loss never drops a committed event.
Fan-out via consumer groups. Each downstream is its own consumer group with independent offsets: dashboard, warehouse, fraud. They read the same stream at their own pace and never interfere — the dashboard can lag during a spike without holding back the warehouse load, and adding a fourth consumer later is zero-touch for producers. This is the property a database-and-queue design cannot give cleanly.
Retention + replay. I set 7-day retention, which is the safety net: if the warehouse consumer ships a bug that mangles a day of data, I fix the code, reset that group's offset to yesterday, and replay — without touching the other consumers or the producers. Retention is what makes the derived stores rebuildable.
What breaks. The first failure is consumer lag on the fraud detector if its scoring model slows down; I monitor lag (Burrow) as the leading signal and autoscale consumers up to the partition count. The second risk is a hot partition if user_id skews (a bot account) — I watch per-partition throughput and, for a known-hot key, add a salt. I never put this on a synchronous request path: it is an async backbone, latency in the tens of ms, not a hot-path RPC.
The result. 500K events/sec absorbed on a handful of brokers, three consumers reading one durable stream independently, a broker can die without data loss, and any consumer can rewind a week — the canonical "decouple with a retained log" backbone.
Cheat sheet
- •Kafka = distributed, partitioned, replicated append-only log. Retained + replayable, not drained.
- •Ordering is per partition only. Partition by the key whose order matters; that key also spreads load.
- •Parallelism within a group = partition count. Add groups for independent fan-out.
- •Throughput ~100K–1M+/sec per broker; latency is ms, not μs — async, not hot-path.
- •Durability: acks=all + min.insync.replicas=2 + RF 3 → survives a broker loss.
- •Delivery is at-least-once; commit offset after the side effect; consumers must be idempotent.
- •Retention or log compaction (latest value per key). Replay = reset the offset.
- •Monitor consumer lag — the earliest overload signal.
Drills
Why can you not have more active consumers than partitions in a group?Reveal
Because within a consumer group, each partition is assigned to exactly one consumer — that is how Kafka preserves per-partition order without coordination. Extra consumers beyond the partition count sit idle with nothing assigned. So the partition count is the hard ceiling on parallelism for a group; to scale further you increase partitions (which reshuffles the key→partition mapping).
Interviewer: "is Kafka exactly-once?"Reveal
Within Kafka, yes — transactions make a consume-process-produce loop between Kafka topics atomic. End-to-end against an external system, no: once a consumer writes to a database or calls an API, a crash between the side effect and the offset commit means at-least-once. The honest design is at-least-once delivery with idempotent consumers (dedupe on an event id). Claiming end-to-end exactly-once is a red flag.
You need strict global ordering of all events. What does that cost in Kafka?Reveal
One partition — that is the only way to get a total order in Kafka. Which means a single consumer, capping throughput at what one consumer can do, with no horizontal scaling. Usually the right move is to push back: do you need global order, or per-entity order? Per-entity (per user, per account) lets you partition by that key and scale, while preserving the order that actually matters. Global ordering almost never survives scrutiny.
What it is