intermediatedeep dive

Delivery semantics

At-most-once, at-least-once, effectively-once, ordering, dedupe, and retry contracts.

~10 min read

Exactly-once is a lie you tell clients; at-least-once + idempotent consumers is the truth. Even Kafka's "exactly-once" is exactly-once *within Kafka* — not end to end.

Read this if your last attempt…

You wrote "exactly-once delivery" and moved on
You can't explain how Kafka transactions actually work
You don't know why ordering across partitions is not a thing
You haven't thought about consumer replay on failure

The concept

Three delivery semantics; only two are practical.

At-most-once — fire and forget. Message may be lost; never duplicated. Acceptable for metrics, non-critical notifications. Rare at interview scale.
At-least-once — message is guaranteed delivered at least once. Duplicates happen on retry. This is the pragmatic default — combine with idempotent consumers for correctness.
Exactly-once — message is delivered and processed exactly once. Impossible in the general distributed case. Kafka offers transactional exactly-once within one Kafka cluster: producer → partitions and consumer-offset commit are in one atomic transaction. The moment you side-effect to an external system (DB, API, email), you're back to at-least-once end-to-end.

Architecture diagram· Producer → queue → consumer, with DLQ

The queue decouples lifetimes: producer commits the work, consumer processes on its own schedule. DLQ catches what can't be processed after N attempts.

Semantics — what you get, what it costs.

Semantics	Duplicates?	Loss?	Typical use
At-most-once	No	Yes (rare but possible)	Metrics, non-critical analytics
At-least-once	Yes — dedup in consumer	No	Default for business events
Exactly-once within Kafka	No (within Kafka)	No	Kafka stream pipelines (compute in Kafka)
Exactly-once end-to-end	— (not achievable generally)	—	Approximation: at-least-once + idempotent consumer

How interviewers grade this

You say "at-least-once + idempotent" — never "exactly-once".
You name the partition key and what it orders.
You have a DLQ with a max-receive threshold and a human alert.
You can explain Kafka transactions and why they're not end-to-end exactly-once.
Your consumers store a processed-offset or dedup key; retries don't double-apply.

Variants

At-least-once + idempotent consumer

Broker redelivers on failure; consumer dedups by message id or business key.

The pragmatic default. Consumer keeps a dedup set (Redis, DB table) of processed ids; skips duplicates. Combined with the outbox pattern upstream, this is the battle-tested shape for event-driven systems.

Pros

+Simple to reason about
+Works across any broker
+Correct with modest consumer discipline

Cons

−Dedup storage cost
−Consumer must be idempotent — not always natural

Choose this variant when

Event-driven systems
Cross-service integration
Any at-least-once broker

Kafka transactional (EOS within Kafka)

Producer + consumer-offset commit in one Kafka transaction.

Use when the processing stays within Kafka — read from topic A, write to topic B, commit offset. The whole thing is atomic. The moment you touch an external DB, re-add dedup logic because Kafka transactions don't extend there.

Pros

+True exactly-once for in-Kafka pipelines
+Simplifies Kafka-native streaming jobs

Cons

−~10–30% throughput penalty
−Only covers Kafka; external effects still need idempotency
−Operational complexity

Choose this variant when

Streaming jobs with Kafka in, Kafka out
Where transactions are a hard requirement and performance can tolerate it

Outbox + at-least-once publish

Write business + outbox row in one DB tx; CDC publishes; consumer dedups.

The workhorse pattern for services that own a DB. Guarantees the event is published iff the business change committed (no dual-write race). Consumers still must dedup because the publisher might retry.

Pros

+Solves the DB-vs-broker dual-write problem
+Works with any broker
+Clean operational model

Cons

−Latency floor ~1 s for CDC to publish
−Adds a table and a CDC pipeline

Choose this variant when

Service owns a DB and emits events
Any ordered event stream downstream

Worked example

Scenario: user completes a payment → downstream wants to send a receipt email, update loyalty, index for search.

Publish:

Payments service writes payments row + outbox row in one Postgres transaction.
Debezium CDC tails outbox, publishes payment.completed to Kafka, partitioned by user_id.
Order preserved per user.

Consumers (each a separate consumer group):

1Email service: reads payment.completed, constructs email, sends via SendGrid. Dedup by payment_id in Redis (7-day TTL). Retries on failure; DLQ after 5 attempts.
2Loyalty service: reads same topic, increments points in loyalty DB. Idempotency key = payment_id ensures double-consumption doesn't double-credit.
3Search indexer: reads same topic, upserts to Elasticsearch. Upsert is naturally idempotent.

Failure modes:

Email provider flaky → broker redelivers; after 5 fails → DLQ; PagerDuty alert; engineer inspects.
Loyalty DB unavailable → consumer pauses (doesn't commit offset); retries until success or circuit-breaks. Messages accumulate in the topic but aren't lost.
Payment rows committed but outbox delayed → eventual consistency, typically <1 s; downstream sees the event after the CDC cycle.

Non-negotiable: every consumer is idempotent. Every consumer has a DLQ. Every DLQ has an alert.

Good vs bad answer

Interviewer probe

“Do you guarantee exactly-once processing?”

Weak answer

"Yes, we use Kafka which is exactly-once."

Strong answer

"No one does end-to-end exactly-once — it's not achievable in the general distributed case. We do at-least-once delivery with idempotent consumers. Each message carries a stable id; consumers dedup in a Redis set before applying the side effect. Kafka's transactional API gives us exactly-once within Kafka for in-cluster streaming, but as soon as we write to an external DB or API, we're back to at-least-once end-to-end — which is why the consumer dedups."

Why it wins: Corrects the myth, names the actual shape (at-least-once + idempotent), explains the scope of Kafka transactions.

Interview playbook2–3 min whenever async messaging or streams enter the design

When it comes up

You introduced a queue, a stream, or any async consumer
The interviewer asks "do you guarantee exactly-once?"
Payment / order events where a duplicate has real consequences
Anything that needs a defined ordering guarantee

Order of reveal

1
1. State the default. At-least-once delivery with idempotent consumers. That is the honest, achievable contract — I will not claim exactly-once end-to-end.
2
2. Name the partition key. I partition by the key whose order matters — user_id, account_id, aggregate_id — and accept that events across different keys may reorder.
3
3. Commit offset after the side effect. The consumer commits its offset only after the side effect succeeds, so a crash mid-process re-delivers rather than silently drops.
4
4. DLQ + max-receive. Every consumer has a dead-letter queue with a max-receive count and a human alert, so a poison message cannot jam the partition forever.
5
5. Replay story. A log (Kafka) replays by offset; a queue (SQS) deletes on ack, so replay needs an upstream source — outbox or CDC.

Signature phrases

“At-least-once plus idempotent consumers — never exactly-once end-to-end.”

“Kafka's exactly-once is within Kafka; the external write puts you back to at-least-once.”

“Partition by the key whose order matters; accept cross-key reordering.”

“Every consumer has a DLQ, a max-receive, and an alert.”

“At-least-once plus idempotent consumers — never exactly-once end-to-end.” — Avoids the single most common delivery-semantics trap.
“Kafka's exactly-once is within Kafka; the external write puts you back to at-least-once.” — Shows you know the precise scope of EOS.
“Partition by the key whose order matters; accept cross-key reordering.” — Demonstrates you understand ordering is per-partition, not global.
“Every consumer has a DLQ, a max-receive, and an alert.” — Operational maturity interviewers look for at senior+.

Likely follow-ups

?“The interviewer insists the system must be exactly-once. What do you say?”Reveal

I reframe: what they want is effectively-once, which is at-least-once delivery plus an idempotent consumer. The broker may deliver a message twice; the consumer dedups on a stable id (Redis set or DB unique index) so the effect happens once. Kafka transactions give true exactly-once for a Kafka→Kafka pipeline, but the moment we write to an external DB or call an API, idempotency is what actually protects us.

?“Where does the dedup state live, and how large does it get?”Reveal

A dedup store keyed by event id — Redis set with a TTL, or a DB table with a unique index. The TTL must exceed the maximum redelivery / replay window (often 24–72h). Size = events-per-window × key size; for millions/day that is gigabytes, which is why the TTL matters — you reclaim it once replay is impossible.

?“A poison message keeps failing. Walk me through what happens.”Reveal

The broker redelivers after the visibility timeout. A receive counter increments each time; once it crosses the max-receive threshold, the message is routed to the DLQ instead of the main queue, and an alert pages the owning team. The DLQ preserves the message for inspection. Critically, this stops one bad message from blocking every message behind it on the partition.

Common mistakes

Claiming exactly-once

It's an interview trap. Say at-least-once + idempotent. Kafka transactions are scoped to Kafka only.

Assuming global ordering

Across partitions, there is no order. Partition by the key whose ordering matters; accept cross-key reordering.

No DLQ

A poison message jams the partition forever. Always: max-receive + DLQ + alert.

Committing offset before side effectAdvanced

Consumer commits, then crashes before the side effect completes — message is lost. Commit offset after the side effect, accepting duplicate risk.

Practice drills

Explain Kafka's "exactly-once" in one paragraph.Reveal

Producer writes messages + consumer offset commits as one Kafka transaction. If the transaction aborts, none are visible to downstream. This makes a Kafka→Kafka pipeline exactly-once: read topic A, write topic B, commit offset, atomically. It does NOT extend to external sinks — writing to Postgres inside a Kafka transaction is not atomic with Kafka. For end-to-end exactly-once semantics against an external DB, you still need idempotent writes (upsert by key, conditional update, or dedup table).

Your consumer processed a message, updated the DB, then crashed before committing the offset. What happens next?Reveal

Broker redelivers the message. Consumer processes it again. If your DB update was idempotent (upsert by key), the second attempt is a no-op. If not, you now have a duplicate effect. That's why at-least-once + idempotent matters: crash-between-side-effect-and-commit is the common case, not the rare case.

Interviewer: "you need total ordering of events across all users. How?"Reveal

One partition. That's the only way. Which means throughput is capped at what one consumer can do. Usually the interviewer is testing whether you recognise the scaling cost. Push back: "do you really need cross-user ordering, or per-user ordering?" — if per-user, partition by user_id and scale. Cross-user global order almost never survives scrutiny.

Cheat sheet

•Default: at-least-once + idempotent consumer.
•Never claim exactly-once end-to-end.
•Partition key decides which ordering is preserved.
•Commit offset after side effect, not before.
•DLQ + max-receive + alert. Always.
•Dedup storage: Redis set or DB unique index, scoped to the event id.

Practice this skill

No problem is tagged directly to Delivery semantics yet. These published problems still exercise the same interview category.

webhook delivery notification service rate limiter

Read this if

Semantics

Duplicates?

Loss?

Typical use

At-most-once

Yes (rare but possible)

Metrics, non-critical analytics

At-least-once

Yes — dedup in consumer

Default for business events

Exactly-once within Kafka

No (within Kafka)

Kafka stream pipelines (compute in Kafka)

Exactly-once end-to-end

— (not achievable generally)

—

Approximation: at-least-once + idempotent consumer