Technology·Messaging & queues

Message Queue (SQS / RabbitMQ)

A buffer between producers and workers that decouples them, absorbs bursts, and processes work asynchronously with built-in retries and dead-letter handling — the simple task-queue answer when you do not need a full event log.

Also worth naming: Amazon SQS · RabbitMQ · ActiveMQ · Google Pub/Sub · Azure Service Bus

~25 min read·15 sections

A message queue is the "do this later, reliably" primitive: enqueue work, return immediately, and let workers process it at their own pace with retries and a dead-letter safety net. Reach for it before reaching for Kafka.

What it is

A message queue is a buffer that sits between a producer (which creates work) and consumers/workers (which do it). The producer drops a message on the queue and returns immediately; workers pull messages, process them, and acknowledge each when done. That simple decoupling is the backbone of asynchronous processing — it lets a system accept work fast and do the slow part in the background, smooth out bursts, and scale producers and consumers independently.

Unlike a streaming log like Kafka, a classic queue is a drained structure: once a message is acknowledged it is gone, there is no replay, and you do not think in partitions and offsets. What you get instead is a lighter, simpler model with per-message lifecycle: a visibility timeout hides an in-flight message so two workers don't both process it, redelivery if a worker crashes before acking, automatic retries with backoff, and a dead-letter queue for messages that keep failing. SQS is the fully-managed, effectively-infinite AWS queue (fire-and-forget background jobs); RabbitMQ is a self-hosted broker with rich routing (exchanges, topics, priorities).

In an interview, a message queue is the right answer for background jobs, async work, and burst absorption — send an email, transcode a thumbnail, process an upload, fan a task out to workers — where you need reliable delivery and easy retries but not retention, replay, or many independent consumer groups. If you need those, you reach for Kafka; if you just need "do this later, reliably," you reach for a queue, and you do not stand up Kafka for it.

When to reach for it

Reach for this when…

Background / asynchronous jobs — email, transcoding, image processing, report generation
Decoupling a fast producer from slower workers, absorbing traffic bursts
You need reliable delivery with retries and a dead-letter queue, simply
Task distribution across a worker pool that scales independently of producers

Not really this pattern when…

You need retention and replay, or many independent consumers of the same stream (that is Kafka)
You need very high sustained throughput as an event backbone (Kafka)
The work must happen synchronously within the request (a queue guarantees you blow a tight latency budget)
You need strict total ordering at scale (a single FIFO queue caps throughput)

How it works

Four ideas cover almost every interview use:

1. Decouple, then absorb and scale independently. The producer enqueues and returns; workers pull at their own rate. This breaks the tight coupling between accepting work and doing it: a burst of 1,000 requests against workers that handle 200/sec doesn't drop 800 — they wait in the queue and drain. Producers and consumers scale separately, and a worker outage just means the queue grows until workers return.

Architecture diagram· A queue decouples a producer from a pool of workers

The producer enqueues a message and moves on; workers pull messages at their own pace and acknowledge each one when done. The queue absorbs bursts and lets you scale producers and consumers independently.

2. Per-message reliability: visibility timeout, ack, redelivery. When a worker pulls a message it becomes invisible for a timeout so no other worker grabs it. The worker acknowledges on success and the message is deleted; if it crashes before acking, the timeout lapses and the message is redelivered. Commit the side effect before you ack, and you get at-least-once delivery — which is why workers must be idempotent.

3. Retries and the dead-letter queue. Transient failures are retried (often with backoff). But a poison message — one that always fails (malformed, references deleted data) — would otherwise be retried forever and block the queue. A max-receive count routes it to a dead-letter queue after N attempts, where it is preserved for inspection and an alert fires. Every production queue has a DLQ with a threshold and an alarm.

Architecture diagram· Visibility timeout + acknowledgement + dead-letter queue

A pulled message is hidden (visibility timeout) until the worker acks it. If the worker crashes, the timeout lapses and the message is redelivered. After N failed attempts a poison message is routed to a dead-letter queue and an alert fires, instead of blocking the queue forever.

3b. Bound the queue and watch its depth. A queue is a smoother, not infinite capacity (Little's Law: if arrival rate beats drain rate on average, depth grows without bound). Queue depth is the earliest overload signal — alert on it and autoscale workers; if you genuinely can't keep up, apply backpressure at the producer.

4. Queue vs log — pick by need. SQS/RabbitMQ are queues: drained, per-message ack, easy retries/DLQ, no replay, light to operate. Kafka is a log: retained, replayable, partitioned, many consumer groups, higher throughput, heavier. Need a task queue → SQS/RabbitMQ. Need an event stream → Kafka. Don't run Kafka for a job queue, and don't try to make a queue replay history.

Performance envelope

Message queue characteristics — what to reason about.

Dimension	Queue (SQS / RabbitMQ)	Why it matters
Model	Drained — gone after ack	No replay; simpler than a log
Delivery	At-least-once (FIFO option for exactly-once-ish)	Workers must be idempotent
Throughput	High (SQS ~unlimited std); moderate (RabbitMQ)	Plenty for task queues; not a Kafka-scale backbone
Ordering	Best-effort (std) / strict (FIFO, lower throughput)	Strict order caps parallelism
Retries / DLQ	Built-in: visibility timeout, max-receive, DLQ	Reliability without custom code
Ops burden	SQS: none (managed); RabbitMQ: you run it	SQS for simplicity; RabbitMQ for routing control

Capabilities in interviews

Asynchronous background jobs

Accept a request fast, return immediately, and do the slow work in a worker.

The canonical use. Anything that takes more than a moment or doesn't need to block the response goes on a queue:

text

POST /signup → write user, enqueue "send welcome email" → 200 now
worker: pull → send email → ack

Email, thumbnail generation, PDF/report rendering, search indexing, webhook delivery — all classic queue jobs. The user gets an instant response; the slow part happens reliably in the background with retries if it fails. This is the default pattern for "do X, but not in the request."

Choose this variant when

Email / notifications / webhooks
Media processing, report generation
Any work that should not block the response

Burst absorption & load levelling

Buffer spiky inbound traffic so steady-rate workers are never overwhelmed.

When inbound traffic spikes far above what downstream can handle, the queue absorbs the surge and workers drain it at a sustainable rate:

text

upload spike 10× → queue grows → workers (+ autoscale on depth) drain it

Instead of dropping requests or melting the database during a spike, the queue holds the work and smooths it out. Pair with autoscaling on queue depth so the worker fleet grows during the surge and shrinks after — load levelling without over-provisioning for peak.

Choose this variant when

Bursty ingestion or request spikes
Protecting a slower downstream (DB, third-party API)
Smoothing load to avoid over-provisioning

Task distribution across workers

Spread a stream of tasks across a worker pool that scales independently.

A queue is a natural work-distribution mechanism: many workers pull from one queue, each taking the next available task, so work is balanced automatically and you scale by adding workers:

text

queue → [worker × N]   (competing consumers)

Add workers to go faster, remove them to save cost — no coordination needed, because the queue hands each message to exactly one worker (visibility timeout). This "competing consumers" pattern is how you parallelize a backlog of independent tasks.

Choose this variant when

Parallelizing a backlog of independent tasks
Worker pools that scale with depth
Decoupling task creation from execution

Routing & pub-sub (RabbitMQ)

Route messages to different queues by topic/rule, or fan one message out to many.

RabbitMQ adds rich routing via exchanges: route by routing key, match topic patterns, or broadcast to all bound queues (fan-out):

text

order.created → [billing queue, shipping queue, analytics queue]   (fan-out exchange)

This gives queue simplicity with some of the multi-consumer flexibility of a log, plus priorities and per-message TTLs. For AWS-native fan-out, the equivalent is SNS → multiple SQS queues. When you need routing logic but not a full event log, this is the middle ground.

Choose this variant when

Routing messages by type to different workers
Fan-out to several queues (SNS→SQS, RabbitMQ exchanges)
Priorities or per-message TTL needs

Operating knobs

Queue vs log (SQS/RabbitMQ vs Kafka)

The first decision. Need a task queue — per-message ack, easy retries/DLQ, no replay, light ops? SQS/RabbitMQ. Need an event log — retention, replay, many independent consumer groups, very high throughput? Kafka. Using Kafka for simple background jobs is over-engineering; using a queue where you need replay loses data you needed.

Visibility timeout & idempotency

Set the visibility timeout longer than the worst-case processing time, or a slow message gets redelivered and processed twice. Delivery is at-least-once regardless, so workers must be idempotent (dedupe on a message id / business key). Commit the side effect before acking so a crash redelivers rather than drops.

Retries, backoff & dead-letter queue

Retry transient failures with exponential backoff, but cap attempts with a max-receive count that routes poison messages to a DLQ — otherwise one bad message is retried forever and blocks others. Every queue needs a DLQ and an alarm on its depth; the DLQ is for inspection, not infinite retry.

Ordering & throughput (standard vs FIFO)

Standard queues are high-throughput but best-effort order and at-least-once. FIFO queues give strict ordering and de-duplication but at much lower throughput. Use standard unless ordering is essential; if you need order, scope it (FIFO message groups per key) so you do not serialize everything.

Versus the alternatives

Message queue vs adjacent options.

Dimension	SQS / RabbitMQ	Kafka	Redis Streams
Model	Drained queue (delete on ack)	Retained, replayable log	Lightweight log + groups
Replay	No — gone after ack	Yes — rewind offset	Limited (bounded, in-memory)
Retries / DLQ	Built-in (visibility, max-receive, DLQ)	You build it (offsets + dedupe)	XCLAIM / manual
Throughput	High (SQS) / moderate (RabbitMQ)	Very high (100K–1M+/sec)	High (single node)
Best for	Background jobs, task queues, bursts	Event streaming, CDC, multi-consumer	Queues without standing up Kafka

Failure modes & gotchas

Non-idempotent consumers

Delivery is at-least-once, so a message can be processed more than once (crash before ack, visibility timeout too short, redelivery). A non-idempotent worker double-charges, double-sends, or double-applies. Make consumers idempotent — dedupe on a message id or business key before the side effect.

No dead-letter queue → poison message jam

A message that always fails will be retried forever, consuming workers and (for FIFO/ordered queues) blocking everything behind it. Configure a max-receive count and a DLQ so poison messages are set aside after N attempts and a human is alerted, instead of jamming the pipeline.

Treating the queue as infinite capacity

A queue smooths bursts; it does not add capacity. If the average arrival rate exceeds the worker drain rate, depth grows without bound (Little's Law) until messages age out or memory/cost blows up. Autoscale workers on depth, and apply backpressure at the producer when you genuinely cannot keep up.

Visibility timeout shorter than processing timeAdvanced

If a message takes longer to process than its visibility timeout, the queue assumes the worker died and redelivers it — so a slow job runs twice (or more). Set the timeout above worst-case processing time, or extend it (heartbeat) for long jobs.

Putting a queue on a synchronous low-latency pathAdvanced

A queue is for async work. Inserting one into a path with a tight latency budget (sub-500ms) almost guarantees you blow it — you have added enqueue, poll, and processing latency. Use a queue only when the work can happen out of band; keep synchronous calls synchronous.

In production

Stripe

Reliable asynchronous jobs behind synchronous payments

Stripe processes payments synchronously but offloads the enormous amount of follow-on work — sending receipts and webhooks, updating analytics, syncing to downstream systems, retrying failed external calls — onto asynchronous queues. A charge succeeds in the request path; the dozens of side effects it triggers are enqueued and processed by workers, so the customer-facing API stays fast and the side effects get at-least-once reliability with retries.

Stripe's engineering is also famous for the discipline this page preaches: idempotency everywhere (so a redelivered job never double-charges or double-sends), retries with backoff, and dead-letter handling for jobs that keep failing. It is the canonical illustration of "do the slow, retryable work off the request path on a queue, and make consumers idempotent because delivery is at-least-once."

Takeaway: Keep the user-facing request fast and push follow-on work (webhooks, emails, syncs) onto a queue with idempotent, retrying consumers — at-least-once + idempotency = effectively-once.

Airbnb

A firehose of background jobs on managed queues

Airbnb runs a vast amount of asynchronous work — sending messages and notifications, processing images, updating search indexes, computing pricing signals — through queue-backed job systems processing billions of tasks. The pattern is uniform: a user action enqueues work, a fleet of workers drains it, and the system autoscales workers on queue depth so a surge (a holiday booking spike) is absorbed by the queue and worked off at a sustainable rate rather than overwhelming downstream stores.

Their use is the textbook "queue as a load-leveling buffer plus task distributor": producers and consumers scale independently, bursts are smoothed, failed jobs retry and ultimately dead-letter, and the same job system handles thousands of distinct task types. It shows the message queue as the workhorse decoupling primitive that nearly every large product runs on.

Takeaway: Queues are the universal decoupling + load-leveling primitive — enqueue on user action, autoscale workers on depth so bursts are absorbed, retry then dead-letter failures.

Good vs bad answer

Interviewer probe

“When a user uploads a video, you must transcode it into several resolutions — which takes minutes. How does the upload request work, and how is the transcoding done reliably?”

Weak answer

"The upload endpoint transcodes the video before responding, looping over each resolution, then returns success once all the renditions are ready."

Strong answer

"The request must not block on minutes of transcoding, so I decouple with a message queue. The upload handler stores the file (S3) and enqueues a transcode job, then returns 202 immediately. A pool of workers pulls jobs from the queue and transcodes at its own pace, scaling independently of upload traffic. Each job uses a visibility timeout longer than the transcode time so two workers don't grab the same one, and the worker acks only after the renditions are written — so if it crashes mid-transcode, the message is redelivered and retried. Because delivery is at-least-once, the worker is idempotent (keyed on upload id, it skips work already done). A failing job retries with backoff, and after a max-receive count a poison job goes to a dead-letter queue with an alert instead of jamming the pipeline. I autoscale the worker fleet on queue depth so a surge of uploads is absorbed and drained. I'd use SQS here — managed, effectively infinite, perfect for fire-and-forget background jobs; I wouldn't reach for Kafka because I don't need retention, replay, or many consumer groups for a transcode task queue."

Why it wins: Decouples with a queue and returns immediately, covers the full reliability story (visibility timeout, ack-after-side-effect, redelivery, idempotency, retries, DLQ), autoscales on depth, and explicitly picks a queue over Kafka with the right reasoning.

Interview playbook

Interview playbook1–2 min whenever async/background work or burst absorption appears

When it comes up

Background / async work — email, transcoding, image processing, reports, webhooks
A slow operation that should not block the request
Bursty traffic that would overwhelm a downstream if applied directly
The interviewer asks "how do you do this work without making the user wait?"

Order of reveal

1
1. Decouple and return fast. The producer enqueues the work and returns immediately; workers process it asynchronously at their own pace.
2
2. Reliability per message. Visibility timeout hides in-flight messages, the worker acks after the side effect, and a crash redelivers — at-least-once.
3
3. Idempotent consumers. Because it is at-least-once, workers dedupe on a message/business id so a redelivery is harmless.
4
4. Retries + DLQ. Transient failures retry with backoff; after a max-receive count a poison message goes to a dead-letter queue with an alert.
5
5. Bound + autoscale. I alert on queue depth and autoscale workers; a queue smooths bursts, it does not add capacity.

Signature phrases

“A queue is "do this later, reliably" — enqueue and return.”

“At-least-once delivery, so consumers must be idempotent.”

“Every queue has a dead-letter queue with a max-receive and an alarm.”

“A queue smooths bursts; it does not add capacity.”

“A queue is "do this later, reliably" — enqueue and return.” — States the core value in one line.
“At-least-once delivery, so consumers must be idempotent.” — The non-negotiable correctness point.
“Every queue has a dead-letter queue with a max-receive and an alarm.” — Operational maturity interviewers look for.
“A queue smooths bursts; it does not add capacity.” — Avoids the classic infinite-buffer mistake.

Likely follow-ups

?“A message keeps failing — walk me through what happens.”Reveal

On each failure the worker does not ack, so the visibility timeout lapses and the message is redelivered, retried with backoff. A receive counter increments each attempt; once it crosses the configured max-receive count, the queue stops redelivering it to the main consumers and routes it to a dead-letter queue, where it is preserved for inspection and an alert pages the owning team. This is critical because without it a single poison message — malformed, or referencing deleted data so it can never succeed — would be retried forever, burning worker capacity and, on an ordered queue, blocking every message behind it. The DLQ is for diagnosis, not infinite retry.

?“When do you choose Kafka over SQS/RabbitMQ?”Reveal

When I need what a log provides and a queue does not: retention and replay (rewind and reprocess after a bug), many independent consumer groups reading the same stream (billing, search, analytics all consuming one event stream), ordered partitions, or very high sustained throughput as an event backbone. A drained queue gives none of those — once a message is acked it is gone, and fan-out needs SNS/exchanges. Conversely, for simple background jobs with easy retries and a DLQ, Kafka is heavier than the problem warrants. Rule of thumb: task queue → SQS/RabbitMQ; event stream / replay / multi-consumer → Kafka.

?“How do you make sure a job runs exactly once?”Reveal

You don't get true exactly-once from a queue — delivery is at-least-once (a crash between the side effect and the ack causes redelivery). You get effectively-once by making the consumer idempotent: derive an idempotency key from the message (job id, or a business key), record it when you process, and skip the work if you have already done it. Commit the side effect and the idempotency record together (or in a way the dedupe check covers), and commit before acking so a crash redelivers rather than drops. FIFO queues add content-based de-duplication within a window, which helps, but the durable guarantee comes from idempotent consumers, not the queue.

Worked example

Setup. When a user uploads a video, it must be transcoded into several resolutions — minutes of work. The upload request can't block on it, and transcoding must be reliable even when workers crash or a bad file keeps failing.

The move. Decouple with a message queue. The upload handler stores the file in S3, enqueues a transcode job, and returns 202 Accepted immediately. A pool of workers pulls jobs and transcodes at its own pace, scaling independently of upload traffic. The user never waits on the slow part.

Per-message reliability. Each job uses a visibility timeout longer than the transcode time, so two workers never grab the same one. The worker acks only after the renditions are written — so if it crashes mid-transcode, the visibility timeout lapses and the message is redelivered and retried. Because delivery is at-least-once, the worker is idempotent: keyed on the upload id, it skips work already done, so a redelivery is harmless.

Retries + dead-letter. Transient failures retry with backoff. But a poison message — a corrupt file that always fails — would otherwise be retried forever and clog the pipeline, so after a max-receive count it's routed to a dead-letter queue with an alert for a human to inspect.

Bounding + scaling. A queue smooths bursts, it doesn't add capacity (Little's Law) — so I alert on queue depth and autoscale workers on depth, and if I genuinely can't keep up I apply backpressure at the producer rather than let the queue grow unbounded.

What breaks. The naive version transcodes inside the request (times out) or assumes exactly-once (double-transcodes on redelivery). Idempotent workers + visibility timeout + DLQ are what make it correct. I'd use SQS here — managed, effectively infinite — not Kafka, because I don't need retention, replay, or many consumer groups for a transcode task queue.

The result. Instant upload responses, transcoding that survives worker crashes and poison files, automatic burst absorption via depth-based autoscaling, and at-least-once delivery made effectively-once by idempotent workers.

Cheat sheet

•Message queue = buffer that decouples producer from workers; enqueue and return, process async.
•Drained, not retained: gone after ack, no replay (that is the difference from Kafka).
•At-least-once delivery → consumers MUST be idempotent. Commit side effect before ack.
•Visibility timeout hides in-flight messages (set > processing time); crash → redelivery.
•Retries with backoff + max-receive → dead-letter queue + alert. Every queue has a DLQ.
•A queue smooths bursts, it does not add capacity (Little's Law). Alert on depth, autoscale workers.
•Standard = high throughput, best-effort order; FIFO = strict order + dedupe, lower throughput.
•SQS = managed, infinite, fire-and-forget. RabbitMQ = self-hosted, rich routing/priorities. Kafka for logs.

Drills

Why must a message-queue consumer be idempotent?Reveal

Because delivery is at-least-once: a worker can pull a message, complete the side effect, then crash before acknowledging — so the visibility timeout lapses and the message is redelivered and processed again. A short visibility timeout relative to processing time causes the same double-processing. If the consumer is not idempotent, that means a double-charge, double-email, or double-write. Idempotency — dedupe on a message id or business key before applying the side effect — makes a redelivery a harmless no-op, which is the only way to get effectively-once behavior from an at-least-once system.

Interviewer: "your queue depth is growing without bound. What's happening and what do you do?"Reveal

Sustained overload: the average arrival rate exceeds the worker drain rate, so by Little's Law the depth grows without limit — a queue is a smoother, not extra capacity. First, scale out consumers (autoscale on depth) to raise the drain rate, and verify the workers aren't blocked on a slow downstream dependency (in which case fix that). If you genuinely cannot keep up, apply backpressure at the producer — reject or throttle new work (429 + Retry-After) rather than letting the queue grow until messages age out or cost explodes. Never just wait and hope it drains; a growing queue with no plan is hiding a capacity problem.

When is a queue the wrong tool for the job?Reveal

When the work must happen synchronously within a tight latency budget — adding a queue guarantees you blow a sub-500ms SLA, since you have introduced enqueue, poll, and processing latency. When you need retention or replay, or many independent consumers of the same stream, or very high-throughput event streaming — those are Kafka's job; a drained queue loses the message after ack. And when you need strict total ordering at high throughput — a single FIFO queue serializes everything and caps throughput. A queue is specifically for asynchronous, independently-processable tasks where at-least-once delivery with retries is the right reliability model.

What it is

Message Queue (SQS / RabbitMQ)

Also worth naming: Amazon SQS · RabbitMQ · ActiveMQ · Google Pub/Sub · Azure Service Bus

~25 min read·15 sections

Dimension

Queue (SQS / RabbitMQ)

Why it matters

Model

Drained — gone after ack

No replay; simpler than a log

Delivery

At-least-once (FIFO option for exactly-once-ish)

Workers must be idempotent

Throughput

High (SQS ~unlimited std); moderate (RabbitMQ)

Plenty for task queues; not a Kafka-scale backbone

Ordering

Best-effort (std) / strict (FIFO, lower throughput)

Strict order caps parallelism

Retries / DLQ

Built-in: visibility timeout, max-receive, DLQ

Reliability without custom code

Ops burden

SQS: none (managed); RabbitMQ: you run it

SQS for simplicity; RabbitMQ for routing control

Dimension

SQS / RabbitMQ

Kafka

Redis Streams

Model

Drained queue (delete on ack)

Retained, replayable log

Lightweight log + groups

Replay

No — gone after ack

Yes — rewind offset

Limited (bounded, in-memory)

Retries / DLQ

Built-in (visibility, max-receive, DLQ)

You build it (offsets + dedupe)

XCLAIM / manual

Throughput

High (SQS) / moderate (RabbitMQ)

Very high (100K–1M+/sec)

High (single node)

Best for

Background jobs, task queues, bursts

Event streaming, CDC, multi-consumer

Queues without standing up Kafka