Message Queue (SQS / RabbitMQ)
A buffer between producers and workers that decouples them, absorbs bursts, and processes work asynchronously with built-in retries and dead-letter handling — the simple task-queue answer when you do not need a full event log.
Also worth naming: Amazon SQS · RabbitMQ · ActiveMQ · Google Pub/Sub · Azure Service Bus
A message queue is the "do this later, reliably" primitive: enqueue work, return immediately, and let workers process it at their own pace with retries and a dead-letter safety net. Reach for it before reaching for Kafka.
What it is
A message queue is a buffer that sits between a producer (which creates work) and consumers/workers (which do it). The producer drops a message on the queue and returns immediately; workers pull messages, process them, and acknowledge each when done. That simple decoupling is the backbone of asynchronous processing — it lets a system accept work fast and do the slow part in the background, smooth out bursts, and scale producers and consumers independently.
Unlike a streaming log like Kafka, a classic queue is a drained structure: once a message is acknowledged it is gone, there is no replay, and you do not think in partitions and offsets. What you get instead is a lighter, simpler model with per-message lifecycle: a visibility timeout hides an in-flight message so two workers don't both process it, redelivery if a worker crashes before acking, automatic retries with backoff, and a dead-letter queue for messages that keep failing. SQS is the fully-managed, effectively-infinite AWS queue (fire-and-forget background jobs); RabbitMQ is a self-hosted broker with rich routing (exchanges, topics, priorities).
In an interview, a message queue is the right answer for background jobs, async work, and burst absorption — send an email, transcode a thumbnail, process an upload, fan a task out to workers — where you need reliable delivery and easy retries but not retention, replay, or many independent consumer groups. If you need those, you reach for Kafka; if you just need "do this later, reliably," you reach for a queue, and you do not stand up Kafka for it.
When to reach for it
Reach for this when…
- Background / asynchronous jobs — email, transcoding, image processing, report generation
- Decoupling a fast producer from slower workers, absorbing traffic bursts
- You need reliable delivery with retries and a dead-letter queue, simply
- Task distribution across a worker pool that scales independently of producers
Not really this pattern when…
- You need retention and replay, or many independent consumers of the same stream (that is Kafka)
- You need very high sustained throughput as an event backbone (Kafka)
- The work must happen synchronously within the request (a queue guarantees you blow a tight latency budget)
- You need strict total ordering at scale (a single FIFO queue caps throughput)
How it works
Four ideas cover almost every interview use:
1. Decouple, then absorb and scale independently. The producer enqueues and returns; workers pull at their own rate. This breaks the tight coupling between accepting work and doing it: a burst of 1,000 requests against workers that handle 200/sec doesn't drop 800 — they wait in the queue and drain. Producers and consumers scale separately, and a worker outage just means the queue grows until workers return.
The producer enqueues a message and moves on; workers pull messages at their own pace and acknowledge each one when done. The queue absorbs bursts and lets you scale producers and consumers independently.
2. Per-message reliability: visibility timeout, ack, redelivery. When a worker pulls a message it becomes invisible for a timeout so no other worker grabs it. The worker acknowledges on success and the message is deleted; if it crashes before acking, the timeout lapses and the message is redelivered. Commit the side effect before you ack, and you get at-least-once delivery — which is why workers must be idempotent.
3. Retries and the dead-letter queue. Transient failures are retried (often with backoff). But a poison message — one that always fails (malformed, references deleted data) — would otherwise be retried forever and block the queue. A max-receive count routes it to a dead-letter queue after N attempts, where it is preserved for inspection and an alert fires. Every production queue has a DLQ with a threshold and an alarm.
A pulled message is hidden (visibility timeout) until the worker acks it. If the worker crashes, the timeout lapses and the message is redelivered. After N failed attempts a poison message is routed to a dead-letter queue and an alert fires, instead of blocking the queue forever.
3b. Bound the queue and watch its depth. A queue is a smoother, not infinite capacity (Little's Law: if arrival rate beats drain rate on average, depth grows without bound). Queue depth is the earliest overload signal — alert on it and autoscale workers; if you genuinely can't keep up, apply backpressure at the producer.
4. Queue vs log — pick by need. SQS/RabbitMQ are queues: drained, per-message ack, easy retries/DLQ, no replay, light to operate. Kafka is a log: retained, replayable, partitioned, many consumer groups, higher throughput, heavier. Need a task queue → SQS/RabbitMQ. Need an event stream → Kafka. Don't run Kafka for a job queue, and don't try to make a queue replay history.
Performance envelope
Message queue characteristics — what to reason about.
| Dimension | Queue (SQS / RabbitMQ) | Why it matters |
|---|---|---|
| Model | Drained — gone after ack | No replay; simpler than a log |
| Delivery | At-least-once (FIFO option for exactly-once-ish) | Workers must be idempotent |
| Throughput | High (SQS ~unlimited std); moderate (RabbitMQ) | Plenty for task queues; not a Kafka-scale backbone |
| Ordering | Best-effort (std) / strict (FIFO, lower throughput) | Strict order caps parallelism |
| Retries / DLQ | Built-in: visibility timeout, max-receive, DLQ | Reliability without custom code |
| Ops burden | SQS: none (managed); RabbitMQ: you run it | SQS for simplicity; RabbitMQ for routing control |
Capabilities in interviews
Asynchronous background jobs
Accept a request fast, return immediately, and do the slow work in a worker.
The canonical use. Anything that takes more than a moment or doesn't need to block the response goes on a queue:
POST /signup → write user, enqueue "send welcome email" → 200 now
worker: pull → send email → ackEmail, thumbnail generation, PDF/report rendering, search indexing, webhook delivery — all classic queue jobs. The user gets an instant response; the slow part happens reliably in the background with retries if it fails. This is the default pattern for "do X, but not in the request."
Choose this variant when
- Email / notifications / webhooks
- Media processing, report generation
- Any work that should not block the response
Burst absorption & load levelling
Buffer spiky inbound traffic so steady-rate workers are never overwhelmed.
When inbound traffic spikes far above what downstream can handle, the queue absorbs the surge and workers drain it at a sustainable rate:
upload spike 10× → queue grows → workers (+ autoscale on depth) drain itInstead of dropping requests or melting the database during a spike, the queue holds the work and smooths it out. Pair with autoscaling on queue depth so the worker fleet grows during the surge and shrinks after — load levelling without over-provisioning for peak.
Choose this variant when
- Bursty ingestion or request spikes
- Protecting a slower downstream (DB, third-party API)
- Smoothing load to avoid over-provisioning
Task distribution across workers
Spread a stream of tasks across a worker pool that scales independently.
A queue is a natural work-distribution mechanism: many workers pull from one queue, each taking the next available task, so work is balanced automatically and you scale by adding workers:
queue → [worker × N] (competing consumers)Add workers to go faster, remove them to save cost — no coordination needed, because the queue hands each message to exactly one worker (visibility timeout). This "competing consumers" pattern is how you parallelize a backlog of independent tasks.
Choose this variant when
- Parallelizing a backlog of independent tasks
- Worker pools that scale with depth
- Decoupling task creation from execution
Routing & pub-sub (RabbitMQ)
Route messages to different queues by topic/rule, or fan one message out to many.
RabbitMQ adds rich routing via exchanges: route by routing key, match topic patterns, or broadcast to all bound queues (fan-out):
order.created → [billing queue, shipping queue, analytics queue] (fan-out exchange)This gives queue simplicity with some of the multi-consumer flexibility of a log, plus priorities and per-message TTLs. For AWS-native fan-out, the equivalent is SNS → multiple SQS queues. When you need routing logic but not a full event log, this is the middle ground.
Choose this variant when
- Routing messages by type to different workers
- Fan-out to several queues (SNS→SQS, RabbitMQ exchanges)
- Priorities or per-message TTL needs
Operating knobs
Queue vs log (SQS/RabbitMQ vs Kafka)
The first decision. Need a task queue — per-message ack, easy retries/DLQ, no replay, light ops? SQS/RabbitMQ. Need an event log — retention, replay, many independent consumer groups, very high throughput? Kafka. Using Kafka for simple background jobs is over-engineering; using a queue where you need replay loses data you needed.
Visibility timeout & idempotency
Set the visibility timeout longer than the worst-case processing time, or a slow message gets redelivered and processed twice. Delivery is at-least-once regardless, so workers must be idempotent (dedupe on a message id / business key). Commit the side effect before acking so a crash redelivers rather than drops.
Retries, backoff & dead-letter queue
Retry transient failures with exponential backoff, but cap attempts with a max-receive count that routes poison messages to a DLQ — otherwise one bad message is retried forever and blocks others. Every queue needs a DLQ and an alarm on its depth; the DLQ is for inspection, not infinite retry.
Ordering & throughput (standard vs FIFO)
Standard queues are high-throughput but best-effort order and at-least-once. FIFO queues give strict ordering and de-duplication but at much lower throughput. Use standard unless ordering is essential; if you need order, scope it (FIFO message groups per key) so you do not serialize everything.
Versus the alternatives
Message queue vs adjacent options.
| Dimension | SQS / RabbitMQ | Kafka | Redis Streams |
|---|---|---|---|
| Model | Drained queue (delete on ack) | Retained, replayable log | Lightweight log + groups |
| Replay | No — gone after ack | Yes — rewind offset | Limited (bounded, in-memory) |
| Retries / DLQ | Built-in (visibility, max-receive, DLQ) | You build it (offsets + dedupe) | XCLAIM / manual |
| Throughput | High (SQS) / moderate (RabbitMQ) | Very high (100K–1M+/sec) | High (single node) |
| Best for | Background jobs, task queues, bursts | Event streaming, CDC, multi-consumer | Queues without standing up Kafka |
Failure modes & gotchas
Delivery is at-least-once, so a message can be processed more than once (crash before ack, visibility timeout too short, redelivery). A non-idempotent worker double-charges, double-sends, or double-applies. Make consumers idempotent — dedupe on a message id or business key before the side effect.
A message that always fails will be retried forever, consuming workers and (for FIFO/ordered queues) blocking everything behind it. Configure a max-receive count and a DLQ so poison messages are set aside after N attempts and a human is alerted, instead of jamming the pipeline.
A queue smooths bursts; it does not add capacity. If the average arrival rate exceeds the worker drain rate, depth grows without bound (Little's Law) until messages age out or memory/cost blows up. Autoscale workers on depth, and apply backpressure at the producer when you genuinely cannot keep up.
If a message takes longer to process than its visibility timeout, the queue assumes the worker died and redelivers it — so a slow job runs twice (or more). Set the timeout above worst-case processing time, or extend it (heartbeat) for long jobs.
A queue is for async work. Inserting one into a path with a tight latency budget (sub-500ms) almost guarantees you blow it — you have added enqueue, poll, and processing latency. Use a queue only when the work can happen out of band; keep synchronous calls synchronous.
In production
Stripe
Reliable asynchronous jobs behind synchronous payments
Stripe processes payments synchronously but offloads the enormous amount of follow-on work — sending receipts and webhooks, updating analytics, syncing to downstream systems, retrying failed external calls — onto asynchronous queues. A charge succeeds in the request path; the dozens of side effects it triggers are enqueued and processed by workers, so the customer-facing API stays fast and the side effects get at-least-once reliability with retries.
Stripe's engineering is also famous for the discipline this page preaches: idempotency everywhere (so a redelivered job never double-charges or double-sends), retries with backoff, and dead-letter handling for jobs that keep failing. It is the canonical illustration of "do the slow, retryable work off the request path on a queue, and make consumers idempotent because delivery is at-least-once."
Airbnb
A firehose of background jobs on managed queues
Airbnb runs a vast amount of asynchronous work — sending messages and notifications, processing images, updating search indexes, computing pricing signals — through queue-backed job systems processing billions of tasks. The pattern is uniform: a user action enqueues work, a fleet of workers drains it, and the system autoscales workers on queue depth so a surge (a holiday booking spike) is absorbed by the queue and worked off at a sustainable rate rather than overwhelming downstream stores.
Their use is the textbook "queue as a load-leveling buffer plus task distributor": producers and consumers scale independently, bursts are smoothed, failed jobs retry and ultimately dead-letter, and the same job system handles thousands of distinct task types. It shows the message queue as the workhorse decoupling primitive that nearly every large product runs on.
Good vs bad answer
Interviewer probe
“When a user uploads a video, you must transcode it into several resolutions — which takes minutes. How does the upload request work, and how is the transcoding done reliably?”
Weak answer
"The upload endpoint transcodes the video before responding, looping over each resolution, then returns success once all the renditions are ready."
Strong answer
"The request must not block on minutes of transcoding, so I decouple with a message queue. The upload handler stores the file (S3) and enqueues a transcode job, then returns 202 immediately. A pool of workers pulls jobs from the queue and transcodes at its own pace, scaling independently of upload traffic. Each job uses a visibility timeout longer than the transcode time so two workers don't grab the same one, and the worker acks only after the renditions are written — so if it crashes mid-transcode, the message is redelivered and retried. Because delivery is at-least-once, the worker is idempotent (keyed on upload id, it skips work already done). A failing job retries with backoff, and after a max-receive count a poison job goes to a dead-letter queue with an alert instead of jamming the pipeline. I autoscale the worker fleet on queue depth so a surge of uploads is absorbed and drained. I'd use SQS here — managed, effectively infinite, perfect for fire-and-forget background jobs; I wouldn't reach for Kafka because I don't need retention, replay, or many consumer groups for a transcode task queue."
Why it wins: Decouples with a queue and returns immediately, covers the full reliability story (visibility timeout, ack-after-side-effect, redelivery, idempotency, retries, DLQ), autoscales on depth, and explicitly picks a queue over Kafka with the right reasoning.
Interview playbook
When it comes up
- Background / async work — email, transcoding, image processing, reports, webhooks
- A slow operation that should not block the request
- Bursty traffic that would overwhelm a downstream if applied directly
- The interviewer asks "how do you do this work without making the user wait?"
Order of reveal
- 11. Decouple and return fast. The producer enqueues the work and returns immediately; workers process it asynchronously at their own pace.
- 22. Reliability per message. Visibility timeout hides in-flight messages, the worker acks after the side effect, and a crash redelivers — at-least-once.
- 33. Idempotent consumers. Because it is at-least-once, workers dedupe on a message/business id so a redelivery is harmless.
- 44. Retries + DLQ. Transient failures retry with backoff; after a max-receive count a poison message goes to a dead-letter queue with an alert.
- 55. Bound + autoscale. I alert on queue depth and autoscale workers; a queue smooths bursts, it does not add capacity.
Signature phrases
- “A queue is "do this later, reliably" — enqueue and return.” — States the core value in one line.
- “At-least-once delivery, so consumers must be idempotent.” — The non-negotiable correctness point.
- “Every queue has a dead-letter queue with a max-receive and an alarm.” — Operational maturity interviewers look for.
- “A queue smooths bursts; it does not add capacity.” — Avoids the classic infinite-buffer mistake.
Likely follow-ups
?“A message keeps failing — walk me through what happens.”Reveal
On each failure the worker does not ack, so the visibility timeout lapses and the message is redelivered, retried with backoff. A receive counter increments each attempt; once it crosses the configured max-receive count, the queue stops redelivering it to the main consumers and routes it to a dead-letter queue, where it is preserved for inspection and an alert pages the owning team. This is critical because without it a single poison message — malformed, or referencing deleted data so it can never succeed — would be retried forever, burning worker capacity and, on an ordered queue, blocking every message behind it. The DLQ is for diagnosis, not infinite retry.
?“When do you choose Kafka over SQS/RabbitMQ?”Reveal
When I need what a log provides and a queue does not: retention and replay (rewind and reprocess after a bug), many independent consumer groups reading the same stream (billing, search, analytics all consuming one event stream), ordered partitions, or very high sustained throughput as an event backbone. A drained queue gives none of those — once a message is acked it is gone, and fan-out needs SNS/exchanges. Conversely, for simple background jobs with easy retries and a DLQ, Kafka is heavier than the problem warrants. Rule of thumb: task queue → SQS/RabbitMQ; event stream / replay / multi-consumer → Kafka.
?“How do you make sure a job runs exactly once?”Reveal
You don't get true exactly-once from a queue — delivery is at-least-once (a crash between the side effect and the ack causes redelivery). You get effectively-once by making the consumer idempotent: derive an idempotency key from the message (job id, or a business key), record it when you process, and skip the work if you have already done it. Commit the side effect and the idempotency record together (or in a way the dedupe check covers), and commit before acking so a crash redelivers rather than drops. FIFO queues add content-based de-duplication within a window, which helps, but the durable guarantee comes from idempotent consumers, not the queue.
Worked example
Setup. When a user uploads a video, it must be transcoded into several resolutions — minutes of work. The upload request can't block on it, and transcoding must be reliable even when workers crash or a bad file keeps failing.
The move. Decouple with a message queue. The upload handler stores the file in S3, enqueues a transcode job, and returns 202 Accepted immediately. A pool of workers pulls jobs and transcodes at its own pace, scaling independently of upload traffic. The user never waits on the slow part.
Per-message reliability. Each job uses a visibility timeout longer than the transcode time, so two workers never grab the same one. The worker acks only after the renditions are written — so if it crashes mid-transcode, the visibility timeout lapses and the message is redelivered and retried. Because delivery is at-least-once, the worker is idempotent: keyed on the upload id, it skips work already done, so a redelivery is harmless.
Retries + dead-letter. Transient failures retry with backoff. But a poison message — a corrupt file that always fails — would otherwise be retried forever and clog the pipeline, so after a max-receive count it's routed to a dead-letter queue with an alert for a human to inspect.
Bounding + scaling. A queue smooths bursts, it doesn't add capacity (Little's Law) — so I alert on queue depth and autoscale workers on depth, and if I genuinely can't keep up I apply backpressure at the producer rather than let the queue grow unbounded.
What breaks. The naive version transcodes inside the request (times out) or assumes exactly-once (double-transcodes on redelivery). Idempotent workers + visibility timeout + DLQ are what make it correct. I'd use SQS here — managed, effectively infinite — not Kafka, because I don't need retention, replay, or many consumer groups for a transcode task queue.
The result. Instant upload responses, transcoding that survives worker crashes and poison files, automatic burst absorption via depth-based autoscaling, and at-least-once delivery made effectively-once by idempotent workers.
Cheat sheet
- •Message queue = buffer that decouples producer from workers; enqueue and return, process async.
- •Drained, not retained: gone after ack, no replay (that is the difference from Kafka).
- •At-least-once delivery → consumers MUST be idempotent. Commit side effect before ack.
- •Visibility timeout hides in-flight messages (set > processing time); crash → redelivery.
- •Retries with backoff + max-receive → dead-letter queue + alert. Every queue has a DLQ.
- •A queue smooths bursts, it does not add capacity (Little's Law). Alert on depth, autoscale workers.
- •Standard = high throughput, best-effort order; FIFO = strict order + dedupe, lower throughput.
- •SQS = managed, infinite, fire-and-forget. RabbitMQ = self-hosted, rich routing/priorities. Kafka for logs.
Drills
Why must a message-queue consumer be idempotent?Reveal
Because delivery is at-least-once: a worker can pull a message, complete the side effect, then crash before acknowledging — so the visibility timeout lapses and the message is redelivered and processed again. A short visibility timeout relative to processing time causes the same double-processing. If the consumer is not idempotent, that means a double-charge, double-email, or double-write. Idempotency — dedupe on a message id or business key before applying the side effect — makes a redelivery a harmless no-op, which is the only way to get effectively-once behavior from an at-least-once system.
Interviewer: "your queue depth is growing without bound. What's happening and what do you do?"Reveal
Sustained overload: the average arrival rate exceeds the worker drain rate, so by Little's Law the depth grows without limit — a queue is a smoother, not extra capacity. First, scale out consumers (autoscale on depth) to raise the drain rate, and verify the workers aren't blocked on a slow downstream dependency (in which case fix that). If you genuinely cannot keep up, apply backpressure at the producer — reject or throttle new work (429 + Retry-After) rather than letting the queue grow until messages age out or cost explodes. Never just wait and hope it drains; a growing queue with no plan is hiding a capacity problem.
When is a queue the wrong tool for the job?Reveal
When the work must happen synchronously within a tight latency budget — adding a queue guarantees you blow a sub-500ms SLA, since you have introduced enqueue, poll, and processing latency. When you need retention or replay, or many independent consumers of the same stream, or very high-throughput event streaming — those are Kafka's job; a drained queue loses the message after ack. And when you need strict total ordering at high throughput — a single FIFO queue serializes everything and caps throughput. A queue is specifically for asynchronous, independently-processable tasks where at-least-once delivery with retries is the right reliability model.
What it is