coredeep dive

Idempotency

Retry-safe writes with idempotency keys, dedupe windows, and response replay.

~15 min read

Every retry is a test of your idempotency design. Networks drop, clients retry, at-least-once queues redeliver — if your write path can't absorb a duplicate without double-charging or double-sending, you'll discover it at 2 AM on a Sunday.

Read this if your last attempt…

You designed a POST endpoint without thinking about retries
You said "we'll use at-least-once delivery" without explaining how consumers dedupe
You confuse idempotent (safe to repeat) with commutative (safe in any order)
Your payment system could double-charge on a client retry

The concept

An operation is idempotent if running it N times produces the same end state as running it once. PUT /users/42 {name: "Ana"} is idempotent. POST /orders that creates a new order each time is not — two calls create two orders.

Idempotency matters because retries are unavoidable:

Architecture diagram· Idempotency key protects against retries

Client sends key K with each retry. Server stores (K → result) for a dedup window. Second attempt returns the cached result instead of re-executing.

Idempotency mechanisms — match the technique to the write shape.

Technique	Good for	Fails when
Naturally idempotent verbs (PUT/DELETE)	State-setting operations	Business needs an event trail of each attempt
Client idempotency key	Payments, order creation, sends	Client is a dumb retry-loop with no key storage
Server-derived key (content hash)	Webhook ingestion, event replay	Two legit identical events are indistinguishable
Unique constraint on business key	Create-if-missing (users, accounts)	You want retry to return the existing record, not 409
Conditional writes (If-Match, CAS)	Updates with optimistic concurrency	High-contention hot rows

How interviewers grade this

Every non-idempotent endpoint accepts an Idempotency-Key header and documents the dedup window.
You name the storage for keys (Redis with TTL, DB table with index) and the window (e.g. 24h).
You state what happens on a key collision with a different payload (reject with 422, not silently replace).
You show how the key and the side effect are committed atomically (same transaction / outbox).
You distinguish idempotent from commutative when the interviewer probes on ordering.

Variants

Stripe-style idempotency key

Client-supplied UUID header; server stores (key → response) with 24h TTL.

The industry-standard pattern for external APIs. Client picks a UUID, retries with the same one until it succeeds or gives up. Server returns the original response on any subsequent call within the window. Different payload under same key → 422.

Pros

+Well-understood by clients and SDKs
+Works for any non-idempotent operation
+Clean semantics for the "did my retry work?" case

Cons

−Requires client cooperation
−Dedup store is state on the hot path
−Window bounds the retry horizon

Choose this variant when

Public APIs
Payment, send-money, create-order endpoints

Unique constraint as dedup

Let the DB reject duplicates; catch the unique-violation and return the existing row.

When the operation has a natural unique business key (order_id from the client, external_id from a partner), rely on a DB unique index. On duplicate, fetch the existing record and return it. No separate dedup store, no TTL.

Pros

+No extra infrastructure
+Dedup window = forever
+Simple mental model

Cons

−Only works when a business key exists
−Insert-then-fetch adds a round-trip on duplicates
−Unique violation on shards requires care

Choose this variant when

Operations with a natural business id
Internal services with controlled clients

Consumer-side dedup for at-least-once queues

Message has an id; consumer stores (id → processed) and skips duplicates.

For queue / log consumers with at-least-once delivery. Each message carries a stable id (from the producer or the broker). Consumer looks up the id in a dedup store before executing. Bounded TTL matches the max replay / retry window.

Pros

+Standard pattern for event-driven systems
+Works with any broker
+Combines naturally with outbox

Cons

−Dedup store hits on every message (cost)
−TTL must exceed the broker's retention / replay window
−Key design is subtle under fan-out (one id per consumer group)

Choose this variant when

Kafka / Kinesis / SQS consumers
Any at-least-once delivery path

Worked example

Scenario: POST /payments endpoint.

Contract: Client sends Idempotency-Key: <UUID> header. Same key + same request body within 24h returns the original response. Same key + different body → 422 (prevents cross-request leakage).

Storage: A Postgres table idempotency_keys(key, request_hash, response_body, response_status, created_at, expires_at). Partial unique index on (key) with expires_at > now().

Handler:

1Compute request_hash = sha256(request_body).
2In one transaction:

- INSERT INTO idempotency_keys with ON CONFLICT DO NOTHING. - If the INSERT succeeded, run the payment logic, UPDATE the row with response_body + response_status, COMMIT, return response. - If the INSERT did nothing (duplicate), SELECT the existing row: - If request_hash matches → return stored response. - If request_hash doesn't match → 422 Conflict. - If response_body is still null → another worker is mid-flight: return 409 with Retry-After.

Window & cleanup: 24h TTL. A nightly job deletes expires_at < now() - 1d. Rationale: payment retries beyond 24h are either a bug or a change in intent; force the client to pick a new key.

Why same-transaction: the INSERT of the key and the payment side effect (ledger write) are in one DB transaction. If the handler crashes after charging but before writing the response, the next retry sees the key row with response_body=null, waits briefly for the original to finish or times out, and the ledger row was either committed (retry reads it back via the key) or rolled back (retry proceeds). Either way, no double-charge.

Good vs bad answer

Interviewer probe

“Your POST /orders can be retried by the client. How do you prevent duplicate orders?”

Weak answer

"We'll just not retry. Or the client can check if the order exists before retrying."

Strong answer

"Client sends an Idempotency-Key header — a UUID they generate per logical operation. Server stores (key → response) in Postgres with a 24h TTL, using a unique index. On retry with the same key and same body, we return the stored response without re-executing the order-creation logic; on same key with a different body, we return 422 so a client bug doesn't quietly leak across operations. The key insert and the order insert are in one transaction so a crash between them can't leave an unrecorded order. If the interviewer pushes: I'd also put the same dedup on the downstream consumer if order creation publishes to a queue, keyed on the order_id — two layers of defence because at-least-once delivery is everywhere."

Why it wins: Names the specific mechanism (Idempotency-Key header), the storage and window (Postgres, 24h), the atomicity argument (same-transaction), the edge case (same key, different body), and the defence-in-depth (consumer-side dedup).

Interview playbook3–4 min, concentrated on any money/booking/send write path

When it comes up

Any write path that moves money, creates an order, or sends a message
The interviewer asks "what happens if the client retries?"
You introduced an at-least-once queue or a webhook ingest
A payment or booking flow where a duplicate is unacceptable

Order of reveal

1
1. Concede retries are inevitable. Networks drop, load balancers retry on 5xx, at-least-once queues redeliver. The write path has to absorb a duplicate without a double side effect.
2
2. Prefer idempotent verbs. If the business model allows, I model the operation as a state-setting PUT/DELETE — naturally safe to repeat — before reaching for machinery.
3
3. Otherwise, idempotency key. For POST /charge the client supplies an Idempotency-Key header. The server stores key → response for a 24h window; a repeat returns the stored response without re-executing.
4
4. Commit key + effect atomically. The key row and the side effect commit in the same transaction. If I charge the card then write the key, a crash between them double-charges on retry.
5
5. Extend to the consumer. Idempotency is end-to-end. If order creation publishes to a queue, the consumer dedups on the order id too — two layers, because at-least-once is everywhere.

Signature phrases

“Retries are inevitable, so the write path has to absorb a duplicate.”

“The key and the side effect commit in the same transaction.”

“Same key, different body → 422, not a silent overwrite.”

“Idempotency is end-to-end — I dedup at the API and at the consumer.”

“Retries are inevitable, so the write path has to absorb a duplicate.” — Frames idempotency as a property of the system, not an edge case.
“The key and the side effect commit in the same transaction.” — The single detail that separates a correct design from a 2 AM double-charge.
“Same key, different body → 422, not a silent overwrite.” — Shows you have thought about the malformed-retry edge case.
“Idempotency is end-to-end — I dedup at the API and at the consumer.” — Demonstrates defence-in-depth against at-least-once delivery.

Likely follow-ups

?“Where do you store the keys at scale?”Reveal

Two reasonable homes. Redis with a TTL when the dedup window is short and you want the check off the DB hot path — fast, but you inherit Redis durability limits. A Postgres table with a unique index when you want the dedup decision in the same transaction as the side effect (the safest option for payments) — slower but atomic. I default to the DB table for money flows and Redis for high-volume, lower-stakes dedup.

?“Two retries arrive at exactly the same time. What happens?”Reveal

The unique constraint on the key serialises them: the first INSERT wins and executes; the second hits the conflict. The loser either (a) waits 100–500 ms and re-reads the stored response (best for payments — hides the race from the user), or (b) returns 409 with Retry-After. Without that constraint both could pass a naive "does the key exist?" check and double-execute.

?“Can you make "send money" idempotent without the client cooperating?”Reveal

Only weakly. You derive a key from (from_account, to_account, amount, time_bucket). That catches a fast retry but cannot distinguish two legitimate identical transfers — Alice may really want to send Bob $10 twice in a minute. High-value systems push the key to the client because that is the only layer that knows intent; a tight server-derived bucket (a few seconds) is a fallback, with the ambiguity documented.

Common mistakes

Side effect first, key second

If you charge the card and then write the idempotency key, a crash between the two leaves you with a charge that will be repeated on retry. Always commit the key and the side effect in one transaction — or use an outbox pattern where the key write gates the publish.

Treating "idempotent" as "safe on any schedule"

Idempotent means safe to repeat, not safe under concurrent execution. Two concurrent retries with the same key can both reach the "key missing" branch. Use row-level locks, unique indexes, or a claim-check pattern to serialise.

No dedup window

Infinite-lifetime keys grow the dedup store without bound. Finite lifetime creates the edge case of "retry arrives after expiry" (now it re-executes). Size the TTL to the actual retry horizon (24–72h covers almost everything); document the contract so clients don't retry beyond it.

Idempotency key stored per endpoint but not per userAdvanced

If the key is globally unique, one user's key can collide with another's. Scope the key to (tenant_id, user_id, key) in the dedup table — a small detail with sharp consequences in multi-tenant systems.

Forgetting the consumer sideAdvanced

An idempotent API endpoint can still produce duplicate side effects downstream if the queue between API and consumer is at-least-once and the consumer isn't dedup-aware. Idempotency is an end-to-end property, not a per-hop one.

Practice drills

Walk me through how Stripe handles the Idempotency-Key header.Reveal

Client generates a UUID, sends it as Idempotency-Key. Server stores (key → full response) on first request. Within 24h: same key + same request → 200 with the original response, no re-execution; same key + different request body → 400 (or a specific "idempotency mismatch" error). The dedup window means a retry must happen within 24h; after that, a new key is required. The key is scoped to the API key (merchant) so keys don't collide across merchants.

Interviewer: "what if two retries arrive at the same time?"Reveal

The first one to execute the INSERT into the idempotency_keys table wins (unique constraint). The loser sees a unique-violation or a "key exists with no response yet" state. Options: (a) wait briefly (100–500 ms) and re-read — usually the first request finishes; (b) return 409 Conflict with Retry-After — let the client try again. Which one depends on the UX; for payments we prefer (a) to reduce client-visible flakiness.

Can you make a "send money" operation idempotent without a client-supplied key?Reveal

Only weakly. You'd derive a key from (from_account, to_account, amount, time_bucket). This catches retries within a bucket but can't distinguish two legitimate identical transfers (Alice really does want to send Bob $10 twice in a minute). That's why high-value systems push the key to the client — it's the only place that knows intent. If you must derive server-side, make the bucket tight (e.g. 5 seconds) and document the edge case.

Cheat sheet

•Idempotent = same end state whatever the retry count.
•Prefer naturally idempotent verbs (PUT, DELETE, SET) when the business allows.
•Otherwise: client-supplied Idempotency-Key header + server-side dedup store.
•Dedup storage: in DB with unique index, or Redis with TTL. Typical window: 24h.
•Key + side effect in same transaction. Otherwise crash between them = lost data.
•Same key + different payload → 422, not silent replacement.
•At-least-once everywhere → dedup at every hop, not just the API layer.
•Scope keys per tenant in multi-tenant systems.

Practice this skill

These problems exercise Idempotency. Try one now to apply what you just learned.

rate limiter

Read this if

Technique

Good for

Fails when

Naturally idempotent verbs (PUT/DELETE)

State-setting operations

Business needs an event trail of each attempt

Client idempotency key

Payments, order creation, sends

Client is a dumb retry-loop with no key storage

Server-derived key (content hash)

Webhook ingestion, event replay

Two legit identical events are indistinguishable

Unique constraint on business key

Create-if-missing (users, accounts)

You want retry to return the existing record, not 409

Conditional writes (If-Match, CAS)

Updates with optimistic concurrency

High-contention hot rows