Backpressure and queueing
Bounded queues, slow consumers, overload behavior, and Little's Law in live systems.
Little's Law isn't optional — it's why your queue grows without bound when the producer outruns the consumer. Unbounded queues are a bug, not a feature.
Read this if your last attempt…
- You put a queue between services and assumed it solves overload
- Your design has no admission control or load shedding
- You can't state what happens when the consumer can't keep up
- You haven't heard of Little's Law
The concept
Little's Law: L = λ × W. Items in the queue (L) = arrival rate (λ) × average time in queue (W). If λ_arrive > λ_drain, L grows without bound. Period.
A queue between a fast producer and a slow consumer is not a solution — it's a delay mechanism that transforms "service now overloaded" into "service overloaded with latency". The queue buys you:
- Burst absorption (seconds to minutes of decoupling).
- Smoothing (producer doesn't see consumer hiccups).
Queue has a max. When full, producer gets an error and can propagate 429 upstream. Unbounded queues are the anti-pattern.
Full-queue policies.
| Policy | What it does | When to use |
|---|---|---|
| Reject new (fail-fast) | Producer gets error; propagate 429 upstream | Default; most APIs |
| Block publish | Producer waits; propagates latency upstream | Internal tight-coupled pipelines where loss is worse than slowness |
| Drop-newest | Lose the most recent message | Metrics, non-critical analytics where recency is replaceable |
| Drop-oldest | Make room by evicting the oldest | Live telemetry where stale data is useless |
| Priority shed | Drop low-priority; keep high | Mixed-workload systems (free vs paid, interactive vs batch) |
How interviewers grade this
- Every queue has a max depth and a policy when full (reject, drop, block).
- You state sustained producer rate vs consumer rate; they have to match.
- Queue depth is an alerting metric.
- Your system has admission control or load shedding at the edge.
- You never describe a queue as a capacity buffer — only as a burst absorber.
Variants
Bounded queue + reject-on-full
Fixed max depth; producers see errors; propagate 429.
The safe default. When the queue fills, the system fails visibly at the entry point instead of silently growing latency. Enables rate-limiting + SLO-based alerting.
Pros
- +Bounded memory
- +Failure is visible and actionable
- +Works with retry-after
Cons
- −Producers must handle errors
- −Requires capacity planning to avoid over-rejection
Choose this variant when
- Most APIs
- Any system with an SLO
Bounded queue + block-on-full
Producer blocks until space — slowness propagates upstream.
The "hold still and wait your turn" strategy. Useful when losing a message is worse than letting slowness propagate (internal pipelines, financial feeds).
Pros
- +No loss
- +Natural backpressure propagation
- +Simple to implement
Cons
- −Latency spikes visible upstream
- −Can deadlock if producer and consumer are mutually dependent
Choose this variant when
- Internal tightly-coupled pipelines
- No-loss is a hard requirement
Priority load-shedding
Classify work by priority; drop low first under load.
Interactive user requests keep going; batch jobs shed first. Free tier shed before paid. Needs a priority attribute on every request and admission control at the entry point.
Pros
- +High-value work preserved under overload
- +Graceful degradation
- +Works with class-based SLOs
Cons
- −Priority-assignment logic can be politically contested
- −Complex to get right
Choose this variant when
- Mixed-workload systems
- Multi-tier (free/paid) products
- Latency-sensitive + batch combined
Worked example
Scenario: video-upload processing pipeline. Producers = upload API. Consumers = transcoding workers.
Normal operation:
- Upload API accepts file, writes to S3, publishes a transcode job to SQS.
- 50 workers consume at ~10 jobs/s each = 500 jobs/s sustained.
Under surge (10× normal):
- Queue depth starts climbing — workers are saturated.
- Alert at depth > 10k (≈ 20 s of backlog).
- Auto-scale policy: if depth > 5k for 2 min, scale workers +50%.
- If scaling can't keep up: at depth > 50k, API returns 503 with Retry-After to new uploads — admission control protects the system from collapse.
Free-tier priority shedding:
- Workers consume from two SQS queues:
transcode-paidandtranscode-free. - Paid drained first; free yields when paid depth > 1k.
- Under extreme load,
transcode-freecan have depth in the hundreds of thousands — eventual is fine.
Key: every queue has an alert on depth. Queue depth is the leading indicator of capacity problems.
Good vs bad answer
Interviewer probe
“What happens when your consumer can't keep up?”
Weak answer
"The queue absorbs it. That's why we have a queue."
Strong answer
"Queue absorbs bursts — seconds to minutes of backlog. Not sustained overload. Little's Law: if producer rate > consumer rate on average, queue grows forever. Our defences: (1) queue has a max depth; at 80% we auto-scale workers; (2) at max, producer sees errors and the API returns 503 Retry-After to new users — admission control protects the system. Queue depth is on the alert dashboard. We NEVER treat the queue as infinite capacity."
Why it wins: Names Little's Law, distinguishes burst from sustained, names the defences, and flags the anti-pattern explicitly.
When it comes up
- You placed a queue between a fast producer and a slower consumer
- The interviewer asks "what happens under a traffic spike?"
- Any pipeline that can fall behind — uploads, transcoding, notifications
- A mixed workload where some traffic matters more than the rest
Order of reveal
- 11. Invoke Little’s Law. L = λ × W. If the arrival rate beats the drain rate on average, the queue grows without bound — a queue smooths bursts, it does not add capacity.
- 22. Bound every queue. Each queue gets a max depth and an explicit policy when full: reject, block, drop, or shed.
- 33. Auto-scale on depth. Queue depth (and message age) drive autoscaling of the consumer fleet before backpressure ever engages.
- 44. Admission control at the edge. When scaling cannot keep up, reject new work at the entry point with 503 + Retry-After — far better than cascading failures inward.
- 55. Shed by priority. Under sustained overload, drop low-priority / free-tier work first so the high-value path survives.
Signature phrases
- “A queue smooths bursts; it does not add capacity.” — Corrects the most common misuse of queues in interviews.
- “Little's Law: if arrival beats drain, the queue grows without bound.” — Grounds the answer in the governing math.
- “Every queue has a max depth and a policy when full.” — Shows you treat unbounded queues as the bug they are.
- “Shed low-priority work at the edge before the system collapses inward.” — Demonstrates graceful degradation under overload.
Likely follow-ups
?“Reject, block, or drop when the queue is full — how do you choose?”Reveal
By workload. Reject (429/503) for user-facing APIs — visible, retryable failure beats silent latency. Block for internal tightly-coupled pipelines where losing a message is worse than propagating slowness upstream. Drop-oldest for telemetry/metrics where the freshest data is the most valuable and stale points are noise. The wrong default is an unbounded queue, which silently chooses "grow until OOM".
?“Which signal tells you about overload first?”Reveal
Queue depth and message age, well before latency or error rate move. By the time p99 latency spikes or requests start timing out, you are already deep in trouble. Depth is the leading indicator: alert on it (and on oldest-message-age, which catches a stalled consumer even when depth looks flat).
?“Your producer and consumer call each other and the queue fills. What happens?”Reveal
A backpressure deadlock: A blocks publishing to a full queue that B drains, but B is blocked waiting on A. Break the cycle with timeouts on publish, a circuit breaker that fails fast when the downstream is saturated, or a priority bypass for the dependency call. The general rule: never let two services mutually block on each other’s bounded queues.
Common mistakes
Memory grows, latency grows, eventually OOM or everything times out. Bound every queue. Choose a policy when full (reject, drop, block).
A queue is a smoother, not a multiplier. Sustained consumer rate must ≥ sustained producer rate.
Queue depth is the earliest indicator of overload. If you don't alert on it, you discover problems via user complaints — way too late.
If service A blocks on a queue to service B, and B calls A, a full queue is a deadlock. Detect and break cycles — timeouts, circuit breakers, priority bypass.
Practice drills
Your queue is at 10M messages and growing. What's the problem class?Reveal
Sustained overload. Producer rate > consumer rate. Workaround is to (a) scale consumers (right size) or (b) shed at the producer (reject new work). Scaling reads buys time but you also need to confirm the consumer per-unit throughput will match — maybe the consumer itself is blocked on a downstream slowness. Never wait and hope the queue drains.
Interviewer: "you have a fire-and-forget analytics pipeline. Bound the queue and drop-oldest or drop-newest?"Reveal
For analytics: drop-oldest. The most recent data is freshest and most valuable. Stale telemetry is noise. Add a metric on drop rate so you can see when you're losing signal. For logging (vs metrics), drop-oldest is still usually right — when you're losing, you want the recent minutes, not the first hour of the outage.
What's the difference between backpressure and load shedding?Reveal
Backpressure = the consumer tells the producer to slow down (block on queue, fail-fast on publish, rate-limit signal). Load shedding = the system selectively rejects some work entirely under overload, often by priority. Backpressure is gradient (slow down); shedding is binary per-request (accept or drop). Most systems need both.
Cheat sheet
- •Little's Law: L = λ × W. Consumer rate must ≥ producer rate on average.
- •Every queue has a max depth + full-policy (reject / block / drop / shed).
- •Alert on queue depth — the earliest overload signal.
- •Admission control at the edge beats cascading failures inside.
- •Load shed by priority when mixed workloads exist.
- •Queue smooths; it doesn't multiply capacity.
Practice this skill
No problem is tagged directly to Backpressure and queueing yet. These published problems still exercise the same interview category.
Read this if