Technology·Messaging & queues

Apache Kafka

A distributed, partitioned, replicated append-only log — the default backbone for high-throughput event streaming, decoupling services, and feeding many consumers from one durable stream.

Also worth naming: Confluent Platform · Amazon MSK · Redpanda (API-compatible) · Azure Event Hubs (Kafka API)

~25 min read·15 sections

Kafka is not a queue you drain — it is a durable log you replay. Once you internalise "append-only log, partitioned for scale, consumed by offset," most of its behaviour in an interview follows from first principles.

What it is

Kafka is a distributed commit log. Producers append records to topics; each topic is split into partitions, and each partition is an ordered, append-only sequence where every record gets a monotonically increasing offset. Consumers read forward from an offset they control — and crucially, reading does not delete the data. Records stay for a configured retention (hours, days, or forever with compaction), so many independent consumers can read the same stream, and a broken consumer can rewind and replay.

That one design choice — a retained log rather than a drained queue — is what makes Kafka the backbone of event-driven systems. The same orders topic can feed billing, search indexing, analytics, and fraud detection, each as an independent consumer group with its own offset, without the producer knowing any of them exist.

Kafka's other defining trait is throughput. By appending sequentially to disk, batching, and letting consumers pull, a single broker handles hundreds of thousands to millions of messages per second. In an interview, Kafka is the answer when you need a durable, replayable, high-volume event stream — and the wrong answer when you need a simple task queue or low-latency request/response.

When to reach for it

Reach for this when…

You need a durable, replayable event stream that many independent consumers read
High write throughput — clickstreams, logs, metrics, IoT, CDC change events
Decoupling services with an event backbone (publish once, many subscribers)
Stream processing or event sourcing where ordered, retained history matters

Not really this pattern when…

You need a simple per-message task queue with easy retries and a DLQ (SQS / RabbitMQ are lighter)
You need low-latency synchronous request/response (that is an RPC, not a log)
Per-message TTL, priority ordering, or arbitrary delete — Kafka is a log, not a queue
Tiny scale where running a Kafka cluster is more operational cost than the problem warrants

How it works

Four ideas explain almost everything Kafka does:

1. The partition is the unit of order and parallelism. Ordering is guaranteed within a partition, never across partitions. The producer chooses a partition by hashing a key (e.g. user_id), so all records for one key land in one partition and stay ordered. This is the single most important design decision: pick the key whose ordering you need to preserve, and accept that throughput is capped by partition count.

Architecture diagram· A topic is a partitioned, append-only log

Producers append to the end of a partition; each record gets a monotonic offset. Consumers track their own offset and read forward — the log is not deleted on read, so many independent consumers replay the same data.

2. Consumer groups give you scaling and fan-out at once. Within a group, each partition is owned by exactly one consumer, so you scale a workload by adding consumers up to the partition count. Add a second group and it reads the whole topic independently with its own offsets — that is how one stream feeds many systems.

Architecture diagram· Partitions are the unit of parallelism; groups give independent reads

Within one consumer group, each partition is owned by exactly one consumer — so max parallelism equals the partition count. Different groups read the same topic independently with their own offsets, which is how one event stream feeds many systems.

3. Replication is how a committed record survives failure. Each partition has one leader and N followers; the in-sync replicas (ISR) are the followers caught up to the leader. With acks=all and min.insync.replicas=2, a write is only acknowledged once it is on multiple brokers — so a single broker dying loses nothing. Lower the acks for speed and you trade away that guarantee.

Architecture diagram· Each partition is replicated; one leader, N followers, an in-sync set

Producers write to the partition leader; followers replicate. A write is committed once all in-sync replicas (ISR) have it. acks=all + min.insync.replicas trades latency for the guarantee that a committed record survives a broker failure.

4. Consumers pull and track their own offset. Kafka does not push or track per-message acks like a traditional broker. The consumer commits an offset meaning "I've processed up to here." Commit after the side effect and a crash re-delivers (at-least-once); commit before and a crash drops (at-most-once). This is why Kafka delivery is at-least-once in practice and consumers must be idempotent.

Performance envelope

Kafka performance envelope — the numbers to quote.

Dimension	Number	Why it matters
Throughput	~100K–1M+ msgs/sec per broker	Sequential disk append + batching; scales by adding brokers/partitions
Latency	Single-digit to tens of ms (acks=all)	Not a low-latency RPC; fine for streaming, not for a 5ms hot path
Retention	Hours to forever (size/time or compaction)	Replay and multi-consumer fan-out depend on it
Parallelism	= partition count per consumer group	You cannot have more active consumers than partitions
Durability	acks=all + ISR ≥ 2 → survives a broker loss	The knob that trades latency for no data loss
Ordering	Per partition only	Global ordering means one partition = no parallelism

Capabilities in interviews

Event backbone / pub-sub

Publish an event once; many independent consumer groups react without the producer knowing them.

The canonical use. A service publishes domain events to a topic; each downstream consumes as its own group:

text

# producer: one write
send("orders", key=order_id, value=OrderPlaced{...})

# three independent consumer groups, three offsets, one stream
billing-group     → charge the card
search-group      → index the order
analytics-group   → update the dashboard

Adding a new consumer is zero-touch for the producer — you just start a new group. This is what "decouple with an event backbone" means in a design, and why Kafka beats point-to-point calls when the number of downstreams grows.

Choose this variant when

Many services react to the same domain events
You expect to add consumers over time
Decoupling producers from a growing set of subscribers

High-throughput ingestion buffer

Absorb a firehose of writes and let downstream stores consume at their own pace.

Clickstreams, logs, metrics, ad impressions, IoT telemetry — millions of events/sec that no OLTP database can take directly. Kafka sits in front as a durable buffer:

text

clients → Kafka (buffer + retain) → stream processor → warehouse / OLAP / time-series DB

The buffer decouples produce rate from consume rate (smoothing bursts) and gives you replay if a downstream consumer has a bug. This is the standard shape for the ad-click aggregator and any analytics ingestion pipeline.

Choose this variant when

Write volume exceeds what the destination store can absorb
Bursty ingestion that needs smoothing
Analytics / metrics / log pipelines

CDC & outbox transport

Carry every committed database change as an ordered, replayable event stream.

Pair Kafka with change-data-capture (Debezium) or the outbox pattern: every committed row change becomes a Kafka record, keyed by primary key so changes per row stay ordered. Downstream consumers rebuild search indexes, caches, and warehouses from the stream — and recover from bugs by resetting the offset and replaying.

This is the backbone of keeping derived stores in sync without dual writes, and the reason Kafka's retention + replay matter so much: your read models are rebuildable.

Choose this variant when

Keeping search/cache/warehouse in sync with a DB
Event sourcing or a derived-view architecture
You need to rebuild a downstream by replaying history

Stream processing source/sink

Feed real-time aggregations, joins, and windowed computation with Kafka Streams or Flink.

Kafka is the durable substrate under stream processors. A job reads from a topic, computes windowed aggregates or joins two streams, and writes results back to another topic:

text

raw-clicks → [Flink: 1-min tumbling window count by ad_id] → click-counts topic → store

Kafka's ordered, replayable partitions are what let the processor recover exactly where it left off after a failure. For the heavy lifting (windowing, state, joins) you reach for Flink; Kafka is the transport and the checkpoint log.

Choose this variant when

Real-time aggregation / windowing / stream joins
Materialising live views from an event stream
Pairing with Flink or Kafka Streams

Operating knobs

Partition count

Partitions set both ordering granularity and max consumer parallelism. Too few and you cap throughput; too many and you pay rebalance, metadata, and open-file overhead. Pick the partition key to match the ordering you need (per user, per order), and size the count to peak throughput ÷ per-consumer throughput, with headroom — increasing partitions later re-shuffles keys.

acks & min.insync.replicas

acks=all + min.insync.replicas=2 means a write must be on at least two in-sync brokers before it is acknowledged — no data loss on a single broker failure, at higher latency. acks=1 is faster but loses un-replicated writes if the leader dies. This is the central durability-vs-latency dial.

Retention & compaction

Time/size retention deletes old records (good for buffers and replay windows). Log compaction instead keeps the latest value per key forever (good for CDC / changelog topics where you want the current state of every key). Choose retention to cover your worst-case replay/recovery need.

Delivery semantics

Default is at-least-once: commit the offset after processing, accept duplicates, make consumers idempotent. Kafka transactions add exactly-once within Kafka (read-process-write between topics), but the moment you write to an external DB you are back to at-least-once end-to-end — so idempotency is non-negotiable.

Versus the alternatives

Kafka vs other messaging options.

Dimension	Kafka	SQS / RabbitMQ	Redis Streams
Model	Retained, replayable partitioned log	Drained queue (delete on ack)	Lightweight log + consumer groups
Throughput	100K–1M+/sec per broker	High (SQS) / moderate (RabbitMQ)	100K+/sec (single node)
Replay	Yes — rewind the offset	No — message gone after ack	Limited (in-memory, bounded)
Fan-out	Many independent consumer groups	Needs SNS/exchanges to fan out	Consumer groups, single node
Best for	Event streaming, CDC, analytics ingest	Simple task queues with DLQ/retries	Queues without standing up Kafka

Failure modes & gotchas

Consumer lag and unbounded backlog

If consumers fall behind the produce rate, lag grows until records age out of retention and are silently lost. Monitor consumer lag (e.g. via Burrow) as the leading signal; autoscale consumers up to the partition count, and remember you cannot add parallelism beyond the partition count without repartitioning.

Treating Kafka as a low-latency queue

Kafka optimises throughput, not per-message latency. Putting it on a synchronous sub-100ms request path adds batching and poll latency you will regret. Use it for async streaming; keep it off the hot request path.

Hot / skewed partitionAdvanced

A bad key (e.g. partition by a value where one bucket dominates) sends most traffic to one partition, capping throughput at one consumer regardless of cluster size. Choose a high-cardinality, evenly-distributed key; add a salt for known-hot keys.

Claiming exactly-once end-to-endAdvanced

Kafka transactions give exactly-once within Kafka. As soon as a consumer writes to an external DB or calls an API, it is at-least-once again. The honest answer is at-least-once delivery with idempotent consumers — say that, not "exactly-once."

Too many or too few partitionsAdvanced

Too few caps parallelism and throughput; too many balloons rebalance time, controller metadata, and open file handles, and raises end-to-end latency. Size deliberately and leave headroom rather than repartitioning a live topic, which reshuffles key→partition assignment.

In production

Where Kafka was born — trillions of messages a day

Kafka was created at LinkedIn to replace a tangle of point-to-point data pipelines with a single unified log, and LinkedIn still runs one of the largest deployments in the world — on the order of 7 trillion messages per day across thousands of brokers and hundreds of clusters. Every meaningful event (profile views, page views, metrics, database changes via CDC) is published once to Kafka and consumed by dozens of downstream systems: search indexing, the news feed, analytics, monitoring, and the data warehouse.

The architectural lesson is exactly the interview framing: instead of N producers each integrating with M consumers (an N×M mess), everything publishes to the log and everything reads from it, so adding a new data consumer is a new consumer group, not a new integration. The partition-per-key model and consumer-group fan-out are what let a single stream feed the entire company.

Takeaway: Kafka replaces N×M point-to-point pipelines with one log — publish an event once, let any number of independent consumer groups read it without touching producers.

Uber

Trillions of messages powering real-time pricing and safety

Uber runs one of the largest Kafka deployments outside LinkedIn — trillions of messages and petabytes of data per day — as the backbone of its real-time platform. Driver and rider location pings, trip events, and app interactions flow through Kafka into stream processors that compute surge pricing, match riders to drivers, detect fraud, and feed dashboards, all within seconds.

Uber's engineering write-ups emphasize the same durability and replay properties: acks=all with replication so a broker failure cannot lose a trip event, partitioning by keys like city or trip id to preserve per-entity ordering, and retention that lets them reprocess streams when a pipeline changes. They pair Kafka (transport + retention) with stream processors (computation) — the exact "Kafka moves the stream, a processor computes on it" split.

Takeaway: At extreme scale Kafka stays the durable transport + retention layer; pair it with a stream processor for computation, and partition by the key whose ordering matters (trip, city).

Good vs bad answer

Interviewer probe

“An ad-click system ingests 500K events/sec and several teams (billing, analytics, fraud) all need the data. How do events flow?”

Weak answer

"I'd write the clicks into a database, and each team can query the table they need. If it's too much, I'll add a queue in front so writes don't overwhelm the DB."

Strong answer

"A Kafka topic clicks, partitioned by ad_id so per-ad ordering holds and load spreads. Producers append at 500K/sec — sequential log writes handle that on a handful of brokers with acks=all and ISR ≥ 2 so a broker loss doesn't drop committed clicks. Then each team is an independent consumer group on the same topic: billing aggregates spend, analytics windows counts into the warehouse, fraud scores in real time — none of them know about each other, and a new consumer is zero-touch for producers. Retention (say 3–7 days) gives every team replay if a consumer has a bug. For the analytics rollups I'd put Flink between Kafka and the store for windowed aggregation. The key thing: Kafka is a retained log, so one ingest stream feeds all consumers and survives downstream failure — a database table in front of a queue gives me neither replay nor clean fan-out."

Why it wins: Names the topic + partition key with the ordering reason, sizes throughput against Kafka's real envelope, sets durability with acks/ISR, uses consumer groups for clean multi-team fan-out, and leans on retention for replay — versus a DB-centric answer that loses fan-out and replay.

Interview playbook

Interview playbook3–4 min when an event backbone or high-volume ingestion is central to the design

When it comes up

High write volume — clickstreams, logs, metrics, IoT, ingestion pipelines
Several consumers need the same event stream
Decoupling services with an event backbone, CDC, or event sourcing
The interviewer asks "how do these services communicate at scale?"

Order of reveal

1
1. Frame it as a log, not a queue. I use Kafka as a durable, replayable log — producers append, consumers read by offset, data is retained so many consumers share one stream.
2
2. Name the partition key. I partition by the key whose order I need (ad_id, user_id); ordering holds per partition and that key also spreads load.
3
3. Consumer groups for fan-out. Each downstream is its own consumer group with independent offsets, so one stream feeds billing, search, and analytics without coupling.
4
4. Set durability. acks=all with min.insync.replicas=2 so a committed record survives a broker failure — the latency cost is worth it here.
5
5. Idempotent consumers. Delivery is at-least-once, so consumers dedupe on an event id; I do not claim exactly-once across an external write.

Signature phrases

“Kafka is a retained log you replay, not a queue you drain.”

“Ordering is per partition; the key is the design decision.”

“One stream, many consumer groups, independent offsets.”

“acks=all + ISR ≥ 2, and consumers are idempotent.”

“Kafka is a retained log you replay, not a queue you drain.” — The core mental model that drives every correct follow-up.
“Ordering is per partition; the key is the design decision.” — Shows you understand the one real constraint.
“One stream, many consumer groups, independent offsets.” — Captures why Kafka beats point-to-point for fan-out.
“acks=all + ISR ≥ 2, and consumers are idempotent.” — Demonstrates durability and honest delivery semantics.

Likely follow-ups

?“How do you guarantee no message is lost?”Reveal

On the produce side, acks=all with min.insync.replicas=2 so a write is only acknowledged once it is on at least two in-sync brokers — a single broker failure loses nothing, and the producer retries on failure with idempotent-producer enabled to avoid duplicates from retries. On the consume side, commit the offset after the side effect completes, so a crash re-delivers rather than skips. Combined with replication factor 3, that is durable end to end. The cost is latency, which is acceptable for a streaming workload.

?“A consumer is falling behind — lag keeps growing. What do you do?”Reveal

First confirm via consumer-lag monitoring. If the consumer is CPU-bound, scale out by adding consumers in the group — but only up to the partition count, since that is the parallelism ceiling. If you are already at the partition count, you must increase partitions (accepting key re-shuffle) or make the consumer cheaper per message (batching, async I/O). If a downstream dependency is the bottleneck, the queue is just hiding a capacity problem — fix the consumer's throughput, do not let lag silently age data out of retention.

?“When would you pick SQS or RabbitMQ over Kafka?”Reveal

When I need a simple task queue, not an event log: per-message acks, easy retries with a built-in dead-letter queue, no need to retain or replay, and lower operational weight. SQS is fully managed and effectively infinite for fire-and-forget background jobs; RabbitMQ gives rich routing and priorities. Kafka earns its complexity when I need high throughput, retention/replay, ordered partitions, and many independent consumer groups — if I don't need those, it is over-engineering.

Worked example

Setup. The interviewer wants the ingestion backbone for an analytics platform: 500K user-activity events/sec from web and mobile, feeding three independent consumers — a real-time dashboard, a data warehouse, and a fraud detector — with no consumer able to slow down the others or the producers.

The move. One Kafka topic, activity, partitioned by user_id so all events for a user stay ordered and load spreads evenly across partitions. I size partitions to throughput: at 500K/sec and ~20K/sec drained per consumer instance, I need ~25 consumers per group, so I provision ~32 partitions (round number with headroom — and remember partition count is the parallelism ceiling per group). Producers append with acks=all and min.insync.replicas=2 on replication factor 3, so a broker loss never drops a committed event.

Fan-out via consumer groups. Each downstream is its own consumer group with independent offsets: dashboard, warehouse, fraud. They read the same stream at their own pace and never interfere — the dashboard can lag during a spike without holding back the warehouse load, and adding a fourth consumer later is zero-touch for producers. This is the property a database-and-queue design cannot give cleanly.

Retention + replay. I set 7-day retention, which is the safety net: if the warehouse consumer ships a bug that mangles a day of data, I fix the code, reset that group's offset to yesterday, and replay — without touching the other consumers or the producers. Retention is what makes the derived stores rebuildable.

What breaks. The first failure is consumer lag on the fraud detector if its scoring model slows down; I monitor lag (Burrow) as the leading signal and autoscale consumers up to the partition count. The second risk is a hot partition if user_id skews (a bot account) — I watch per-partition throughput and, for a known-hot key, add a salt. I never put this on a synchronous request path: it is an async backbone, latency in the tens of ms, not a hot-path RPC.

The result. 500K events/sec absorbed on a handful of brokers, three consumers reading one durable stream independently, a broker can die without data loss, and any consumer can rewind a week — the canonical "decouple with a retained log" backbone.

Cheat sheet

•Kafka = distributed, partitioned, replicated append-only log. Retained + replayable, not drained.
•Ordering is per partition only. Partition by the key whose order matters; that key also spreads load.
•Parallelism within a group = partition count. Add groups for independent fan-out.
•Throughput ~100K–1M+/sec per broker; latency is ms, not μs — async, not hot-path.
•Durability: acks=all + min.insync.replicas=2 + RF 3 → survives a broker loss.
•Delivery is at-least-once; commit offset after the side effect; consumers must be idempotent.
•Retention or log compaction (latest value per key). Replay = reset the offset.
•Monitor consumer lag — the earliest overload signal.

Drills

Why can you not have more active consumers than partitions in a group?Reveal

Because within a consumer group, each partition is assigned to exactly one consumer — that is how Kafka preserves per-partition order without coordination. Extra consumers beyond the partition count sit idle with nothing assigned. So the partition count is the hard ceiling on parallelism for a group; to scale further you increase partitions (which reshuffles the key→partition mapping).

Interviewer: "is Kafka exactly-once?"Reveal

Within Kafka, yes — transactions make a consume-process-produce loop between Kafka topics atomic. End-to-end against an external system, no: once a consumer writes to a database or calls an API, a crash between the side effect and the offset commit means at-least-once. The honest design is at-least-once delivery with idempotent consumers (dedupe on an event id). Claiming end-to-end exactly-once is a red flag.

You need strict global ordering of all events. What does that cost in Kafka?Reveal

One partition — that is the only way to get a total order in Kafka. Which means a single consumer, capping throughput at what one consumer can do, with no horizontal scaling. Usually the right move is to push back: do you need global order, or per-entity order? Per-entity (per user, per account) lets you partition by that key and scale, while preserving the order that actually matters. Global ordering almost never survives scrutiny.

What it is

Apache Kafka

A distributed, partitioned, replicated append-only log — the default backbone for high-throughput event streaming, decoupling services, and feeding many consumers from one durable stream.

Also worth naming: Confluent Platform · Amazon MSK · Redpanda (API-compatible) · Azure Event Hubs (Kafka API)

~25 min read·15 sections

Dimension

Number

Why it matters

Throughput

~100K–1M+ msgs/sec per broker

Sequential disk append + batching; scales by adding brokers/partitions

Latency

Single-digit to tens of ms (acks=all)

Not a low-latency RPC; fine for streaming, not for a 5ms hot path

Retention

Hours to forever (size/time or compaction)

Replay and multi-consumer fan-out depend on it

Parallelism

= partition count per consumer group

You cannot have more active consumers than partitions

Durability

acks=all + ISR ≥ 2 → survives a broker loss

The knob that trades latency for no data loss

Ordering

Per partition only

Global ordering means one partition = no parallelism

# producer: one write send("orders", key=order_id, value=OrderPlaced{...}) # three independent consumer groups, three offsets, one stream billing-group → charge the card search-group → index the order analytics-group → update the dashboard

Dimension

Kafka

SQS / RabbitMQ

Redis Streams

Model

Retained, replayable partitioned log

Drained queue (delete on ack)

Lightweight log + consumer groups

Throughput

100K–1M+/sec per broker

High (SQS) / moderate (RabbitMQ)

100K+/sec (single node)

Replay

Yes — rewind the offset

No — message gone after ack

Limited (in-memory, bounded)

Fan-out

Many independent consumer groups

Needs SNS/exchanges to fan out

Consumer groups, single node

Best for

Event streaming, CDC, analytics ingest

Simple task queues with DLQ/retries

Queues without standing up Kafka