Delivery semantics
At-most-once, at-least-once, effectively-once, ordering, dedupe, and retry contracts.
Exactly-once is a lie you tell clients; at-least-once + idempotent consumers is the truth. Even Kafka's "exactly-once" is exactly-once *within Kafka* — not end to end.
Read this if your last attempt…
- You wrote "exactly-once delivery" and moved on
- You can't explain how Kafka transactions actually work
- You don't know why ordering across partitions is not a thing
- You haven't thought about consumer replay on failure
The concept
Three delivery semantics; only two are practical.
- At-most-once — fire and forget. Message may be lost; never duplicated. Acceptable for metrics, non-critical notifications. Rare at interview scale.
- At-least-once — message is guaranteed delivered at least once. Duplicates happen on retry. This is the pragmatic default — combine with idempotent consumers for correctness.
- Exactly-once — message is delivered and processed exactly once. Impossible in the general distributed case. Kafka offers transactional exactly-once within one Kafka cluster: producer → partitions and consumer-offset commit are in one atomic transaction. The moment you side-effect to an external system (DB, API, email), you're back to at-least-once end-to-end.
The queue decouples lifetimes: producer commits the work, consumer processes on its own schedule. DLQ catches what can't be processed after N attempts.
Semantics — what you get, what it costs.
| Semantics | Duplicates? | Loss? | Typical use |
|---|---|---|---|
| At-most-once | No | Yes (rare but possible) | Metrics, non-critical analytics |
| At-least-once | Yes — dedup in consumer | No | Default for business events |
| Exactly-once within Kafka | No (within Kafka) | No | Kafka stream pipelines (compute in Kafka) |
| Exactly-once end-to-end | — (not achievable generally) | — | Approximation: at-least-once + idempotent consumer |
How interviewers grade this
- You say "at-least-once + idempotent" — never "exactly-once".
- You name the partition key and what it orders.
- You have a DLQ with a max-receive threshold and a human alert.
- You can explain Kafka transactions and why they're not end-to-end exactly-once.
- Your consumers store a processed-offset or dedup key; retries don't double-apply.
Variants
At-least-once + idempotent consumer
Broker redelivers on failure; consumer dedups by message id or business key.
The pragmatic default. Consumer keeps a dedup set (Redis, DB table) of processed ids; skips duplicates. Combined with the outbox pattern upstream, this is the battle-tested shape for event-driven systems.
Pros
- +Simple to reason about
- +Works across any broker
- +Correct with modest consumer discipline
Cons
- −Dedup storage cost
- −Consumer must be idempotent — not always natural
Choose this variant when
- Event-driven systems
- Cross-service integration
- Any at-least-once broker
Kafka transactional (EOS within Kafka)
Producer + consumer-offset commit in one Kafka transaction.
Use when the processing stays within Kafka — read from topic A, write to topic B, commit offset. The whole thing is atomic. The moment you touch an external DB, re-add dedup logic because Kafka transactions don't extend there.
Pros
- +True exactly-once for in-Kafka pipelines
- +Simplifies Kafka-native streaming jobs
Cons
- −~10–30% throughput penalty
- −Only covers Kafka; external effects still need idempotency
- −Operational complexity
Choose this variant when
- Streaming jobs with Kafka in, Kafka out
- Where transactions are a hard requirement and performance can tolerate it
Outbox + at-least-once publish
Write business + outbox row in one DB tx; CDC publishes; consumer dedups.
The workhorse pattern for services that own a DB. Guarantees the event is published iff the business change committed (no dual-write race). Consumers still must dedup because the publisher might retry.
Pros
- +Solves the DB-vs-broker dual-write problem
- +Works with any broker
- +Clean operational model
Cons
- −Latency floor ~1 s for CDC to publish
- −Adds a table and a CDC pipeline
Choose this variant when
- Service owns a DB and emits events
- Any ordered event stream downstream
Worked example
Scenario: user completes a payment → downstream wants to send a receipt email, update loyalty, index for search.
Publish:
- Payments service writes
paymentsrow +outboxrow in one Postgres transaction. - Debezium CDC tails outbox, publishes
payment.completedto Kafka, partitioned by user_id. - Order preserved per user.
Consumers (each a separate consumer group):
- 1Email service: reads
payment.completed, constructs email, sends via SendGrid. Dedup by payment_id in Redis (7-day TTL). Retries on failure; DLQ after 5 attempts. - 2Loyalty service: reads same topic, increments points in loyalty DB. Idempotency key = payment_id ensures double-consumption doesn't double-credit.
- 3Search indexer: reads same topic, upserts to Elasticsearch. Upsert is naturally idempotent.
Failure modes:
- Email provider flaky → broker redelivers; after 5 fails → DLQ; PagerDuty alert; engineer inspects.
- Loyalty DB unavailable → consumer pauses (doesn't commit offset); retries until success or circuit-breaks. Messages accumulate in the topic but aren't lost.
- Payment rows committed but outbox delayed → eventual consistency, typically <1 s; downstream sees the event after the CDC cycle.
Non-negotiable: every consumer is idempotent. Every consumer has a DLQ. Every DLQ has an alert.
Good vs bad answer
Interviewer probe
“Do you guarantee exactly-once processing?”
Weak answer
"Yes, we use Kafka which is exactly-once."
Strong answer
"No one does end-to-end exactly-once — it's not achievable in the general distributed case. We do at-least-once delivery with idempotent consumers. Each message carries a stable id; consumers dedup in a Redis set before applying the side effect. Kafka's transactional API gives us exactly-once within Kafka for in-cluster streaming, but as soon as we write to an external DB or API, we're back to at-least-once end-to-end — which is why the consumer dedups."
Why it wins: Corrects the myth, names the actual shape (at-least-once + idempotent), explains the scope of Kafka transactions.
When it comes up
- You introduced a queue, a stream, or any async consumer
- The interviewer asks "do you guarantee exactly-once?"
- Payment / order events where a duplicate has real consequences
- Anything that needs a defined ordering guarantee
Order of reveal
- 11. State the default. At-least-once delivery with idempotent consumers. That is the honest, achievable contract — I will not claim exactly-once end-to-end.
- 22. Name the partition key. I partition by the key whose order matters — user_id, account_id, aggregate_id — and accept that events across different keys may reorder.
- 33. Commit offset after the side effect. The consumer commits its offset only after the side effect succeeds, so a crash mid-process re-delivers rather than silently drops.
- 44. DLQ + max-receive. Every consumer has a dead-letter queue with a max-receive count and a human alert, so a poison message cannot jam the partition forever.
- 55. Replay story. A log (Kafka) replays by offset; a queue (SQS) deletes on ack, so replay needs an upstream source — outbox or CDC.
Signature phrases
- “At-least-once plus idempotent consumers — never exactly-once end-to-end.” — Avoids the single most common delivery-semantics trap.
- “Kafka's exactly-once is within Kafka; the external write puts you back to at-least-once.” — Shows you know the precise scope of EOS.
- “Partition by the key whose order matters; accept cross-key reordering.” — Demonstrates you understand ordering is per-partition, not global.
- “Every consumer has a DLQ, a max-receive, and an alert.” — Operational maturity interviewers look for at senior+.
Likely follow-ups
?“The interviewer insists the system must be exactly-once. What do you say?”Reveal
I reframe: what they want is effectively-once, which is at-least-once delivery plus an idempotent consumer. The broker may deliver a message twice; the consumer dedups on a stable id (Redis set or DB unique index) so the effect happens once. Kafka transactions give true exactly-once for a Kafka→Kafka pipeline, but the moment we write to an external DB or call an API, idempotency is what actually protects us.
?“Where does the dedup state live, and how large does it get?”Reveal
A dedup store keyed by event id — Redis set with a TTL, or a DB table with a unique index. The TTL must exceed the maximum redelivery / replay window (often 24–72h). Size = events-per-window × key size; for millions/day that is gigabytes, which is why the TTL matters — you reclaim it once replay is impossible.
?“A poison message keeps failing. Walk me through what happens.”Reveal
The broker redelivers after the visibility timeout. A receive counter increments each time; once it crosses the max-receive threshold, the message is routed to the DLQ instead of the main queue, and an alert pages the owning team. The DLQ preserves the message for inspection. Critically, this stops one bad message from blocking every message behind it on the partition.
Common mistakes
It's an interview trap. Say at-least-once + idempotent. Kafka transactions are scoped to Kafka only.
Across partitions, there is no order. Partition by the key whose ordering matters; accept cross-key reordering.
A poison message jams the partition forever. Always: max-receive + DLQ + alert.
Consumer commits, then crashes before the side effect completes — message is lost. Commit offset after the side effect, accepting duplicate risk.
Practice drills
Explain Kafka's "exactly-once" in one paragraph.Reveal
Producer writes messages + consumer offset commits as one Kafka transaction. If the transaction aborts, none are visible to downstream. This makes a Kafka→Kafka pipeline exactly-once: read topic A, write topic B, commit offset, atomically. It does NOT extend to external sinks — writing to Postgres inside a Kafka transaction is not atomic with Kafka. For end-to-end exactly-once semantics against an external DB, you still need idempotent writes (upsert by key, conditional update, or dedup table).
Your consumer processed a message, updated the DB, then crashed before committing the offset. What happens next?Reveal
Broker redelivers the message. Consumer processes it again. If your DB update was idempotent (upsert by key), the second attempt is a no-op. If not, you now have a duplicate effect. That's why at-least-once + idempotent matters: crash-between-side-effect-and-commit is the common case, not the rare case.
Interviewer: "you need total ordering of events across all users. How?"Reveal
One partition. That's the only way. Which means throughput is capped at what one consumer can do. Usually the interviewer is testing whether you recognise the scaling cost. Push back: "do you really need cross-user ordering, or per-user ordering?" — if per-user, partition by user_id and scale. Cross-user global order almost never survives scrutiny.
Cheat sheet
- •Default: at-least-once + idempotent consumer.
- •Never claim exactly-once end-to-end.
- •Partition key decides which ordering is preserved.
- •Commit offset after side effect, not before.
- •DLQ + max-receive + alert. Always.
- •Dedup storage: Redis set or DB unique index, scoped to the event id.
Practice this skill
No problem is tagged directly to Delivery semantics yet. These published problems still exercise the same interview category.
Read this if