The Delivery Framework
Most candidates fail the system design round not on knowledge but on delivery — they design the wrong system brilliantly, run out of time, or graze ten topics instead of nailing two. This is the choreography of the 45 minutes: the order to move through, how long to spend, and exactly what to say in each phase.
The 45-minute arc
~40 min of design + bufferThe rubric
What interviewers are actually grading
Every company words it differently, but the rubric collapses to four competencies. The framework below is built to surface all four.
Problem navigation
Can you break an ambiguous prompt into the right pieces and prioritise them?
Signal: Scoping to three must-haves and protecting that scope when requirements shift.
Solution design
Can you compose components into a coherent system that meets the requirements?
Signal: A clean request path where every box maps to a requirement, no spaghetti.
Technical excellence
Do you know current technologies and apply them correctly, with real numbers?
Signal: Naming concrete tools (a Redis sorted set, a Postgres GIN index) and their trade-offs.
Communication
Can you explain complex ideas clearly and collaborate when challenged?
Signal: Thinking out loud, inviting correction, and engaging calmly with push-back.
The framework
Six phases, in order
Move through these in sequence. The time budgets are for a 45-minute round — scale them to your slot. Don't skip a phase; each one sets up the next.
Scope the requirements
~5 minTurn an open-ended prompt into a designable system with three must-have features and numbered non-functional targets.
The first five minutes decide the next forty. The most common reason a strong engineer fails is not a missing algorithm — it is designing the wrong system brilliantly because they never narrowed the prompt.
Separate functional from non-functional. Functional is what the system does (post a tweet, redirect a URL). Non-functional is how well (p99 latency, availability, scale). "Fast and reliable" is neither — it is a non-answer.
Pick exactly three functional must-haves. List everything the prompt implies, then commit to three and explicitly park the rest. Three is enough to be a real system and few enough to design deeply in the time you have. Senior candidates deliver three features thoroughly; juniors sketch eight superficially.
Put a number on every non-functional requirement. "Scalable" → "100M DAU, 20K peak read QPS". "Fast" → "p99 read under 200 ms". The five that apply almost everywhere: latency, availability, durability, consistency, scale. Name all five; quantify at least three.
Bundle your assumptions and invite one correction. "I'll assume consumer, global, two-year retention, read-heavy by 100:1 — push back if any of that is wrong." One sentence beats three minutes of clarifying questions.
Say this
- +"My three must-haves are X, Y, Z. SSO and analytics are out of scope for v1."
- +"Reads need p99 under 200 ms; writes can take 500 ms but must be durable."
- +"I'll assume 100M DAU, read-heavy 100:1 — does that match what you have in mind?"
Avoid this
- −"It should be fast, scalable, and reliable." (no numbers = no thought)
- −Listing ten features and committing to none of them.
- −Asking six clarifying questions before drawing anything.
Estimate the scale
~3 minDo the back-of-envelope math that changes the design — and skip the math that does not.
Estimation is not arithmetic theatre. Do the numbers that change a decision and say so; skip the rest out loud ("storage is trivial here, I won't size it").
The handful that usually matter:
- Peak QPS from DAU:
DAU × actions/user ÷ 86,400, then ×2–3 for peak. This decides whether one database is enough or you shard. - Storage over the retention window:
rows/day × row size × retention. This decides sharding and tiering. - Bandwidth if you serve media:
QPS × payload size. This decides whether a CDN is mandatory.
Anchor to real capacity so you can right-size: a single Postgres handles ~10K–50K TPS and terabytes; Redis does ~100K+ ops/sec; one app server ~1K–10K rps. Quoting these stops you from sharding a system that does 1K QPS.
The senior move is to derive a number and immediately use it: "That's 20K write QPS — past a single primary, so I'll shard by user_id." A number you don't act on was a number you didn't need.
Say this
- +"20K writes/sec is past one primary, so storage has to shard — that drives the design."
- +"Reads are 100× writes, so this is a caching and read-replica problem, not a sharding one."
- +"Storage is trivial at this scale; I won't spend time sizing it."
Avoid this
- −Grinding through five minutes of arithmetic nobody uses.
- −Quoting 2010-era limits ("a database can only do 1K QPS").
- −Sizing for average load and ignoring the 2–3× peak.
Define the API & core entities
~5 minName the nouns and pin down the contract before drawing boxes, so the design has something concrete to satisfy.
Before architecture, establish what the system stores and how clients talk to it. This gives the high-level design a concrete target instead of an abstract shape.
Core entities — the nouns. List the handful of domain objects (User, Tweet, Follow; or ShortLink, ClickEvent). You don't need a full schema yet — just the entities and their key relationships, so the data model has a spine.
API — the contract. One line per must-have feature, mapped to an endpoint:
POST /tweets {text}→ 201 with idGET /feed?cursor=…→ page of tweetsPOST /follow {userId}
State the shape, the pagination (cursor, not offset, at scale), the idempotency story for writes that can be retried, and the error semantics. Picking REST vs gRPC vs GraphQL is a one-liner justified by the consumer: REST at the public edge, gRPC between services, async for anything over a second.
Keeping this phase tight means your high-level design has endpoints to wire up, not vibes.
Say this
- +"Core entities: User, Tweet, Follow. The feed is derived, not stored raw."
- +"`POST /tweets` takes an Idempotency-Key so a client retry can't double-post."
- +"Cursor pagination, not offset — offset gets slow and inconsistent at scale."
Avoid this
- −Jumping straight to boxes with no entities or endpoints defined.
- −Designing a full normalized schema with every column in minute 8.
- −Offset pagination on a feed that will have millions of rows.
Draw the high-level design
~10 minProduce a clean box diagram that visibly satisfies every functional requirement — and nothing more.
Now draw the system. The goal is a diagram where you can trace each must-have feature across the boxes — not a museum of every component you know.
Start from the canonical request path and adapt: client → edge (CDN/LB) → stateless services → data tier, with an async side-channel (queue + workers) for anything that does not have to block the response.
Drive it from the API. Walk each endpoint through the boxes: "POST /tweets hits the gateway, the tweet service writes to the DB and enqueues a fan-out job." If a feature has no path through your diagram, the diagram is incomplete; if a box serves no feature, delete it.
Justify each component by a requirement. A cache because reads are 100× writes; a queue because fan-out is slow; a CDN because you serve media globally. "I added Kafka because it's good" is a yellow flag — "I added a queue so the write path doesn't block on fan-out" is the senior version.
Keep it legible. A clean seven-box diagram that satisfies the requirements beats a twenty-box sprawl you can't explain.
A clean request path that visibly satisfies each functional requirement: client → edge → stateless services → data tier, with an async side-channel for anything that does not have to block the response. You should be able to trace every must-have feature across these boxes.
Say this
- +"Let me trace `POST /tweets` through the design so we can see it works end to end."
- +"This cache exists because reads are 100× writes — it's not decoration."
- +"I'll keep the services stateless so they scale horizontally behind the LB."
Avoid this
- −Drawing every component you've ever heard of "to be safe".
- −Spaghetti with no clear request flow through it.
- −Adding Kafka / microservices / multi-region with no requirement behind them.
Go deep where it counts
~15 minSpend the largest block on the one or two hard problems — this is where senior and staff signal is earned.
The high-level design proves you can build the thing. The deep dives prove you can build it at scale — and this is the phase that separates levels. Spend the most time here.
Pick the one or two hardest parts and go deep, ideally the ones tied to your tightest non-functional requirements:
- The bottleneck under load (the hot path at peak QPS).
- The scaling lever (sharding key, cache strategy, fan-out on write vs read).
- The failure mode (what happens when the DB, region, or queue dies).
- The consistency edge (read-your-writes, idempotency, exactly-once myths).
Let the interviewer steer, then go deeper than asked. When they probe "what if this shard gets hot?", don't give one sentence — name the detection, the mitigation, and the trade-off. Quantify: "at 50K writes/sec the single primary tips over, so I shard by user_id; the cost is cross-user queries now scatter-gather, which I push to an analytics replica."
Name trade-offs out loud. Every senior answer is "X buys us A at the cost of B." Interviewers are explicitly told to probe trade-offs and to doubt your answers — meeting that with a calm "here's the tension and here's why I chose this side" is the strongest signal you can send.
Say this
- +"The interesting problem here is the hot key — let me design detection and three mitigations."
- +"This buys us low latency at the cost of eventual consistency, which is fine for a feed."
- +"If the region fails, here's the RTO/RPO and the failover runbook."
Avoid this
- −Spreading thin — one shallow sentence on ten topics instead of depth on two.
- −Going silent or defensive when the interviewer pushes back.
- −Claiming "exactly-once delivery" or "100% consistent" without nuance.
Wrap up
~2 minLand the plane: recap the design against the requirements, name the top trade-off, and say what you would do next.
Don't let the interview just run out of time mid-sentence. Take the last two minutes to close deliberately — it leaves a senior final impression.
Recap against the requirements. "We've got tweet creation, the home feed, and follows, all under the latency targets. Reads are cache-served, writes fan out async."
Name the biggest trade-off you made. "The main tension was feed freshness vs read latency — I chose fan-out-on-write for fast reads, accepting heavier write amplification for celebrity accounts, which I'd handle with the hybrid approach."
Say what you'd do with more time. "Next I'd harden the multi-region failover and add the abuse / rate-limiting layer." This shows you know what's unfinished — which is itself senior signal — rather than pretending the design is complete.
Say this
- +"To recap against our requirements: …" then map the design back to the three must-haves.
- +"The biggest trade-off was X; I chose this side because Y."
- +"With more time I'd tackle multi-region failover and the abuse layer next."
Avoid this
- −Letting the clock run out with no summary.
- −Claiming the design is fully complete and bullet-proof.
- −Introducing a brand-new component in the final minute.
The first two minutes
Good vs bad opening
The opening sets the tone for the whole round. Here's the difference between a candidate who looks senior in minute one and one who's already off-track.
Interviewer probe
“The interviewer says: "Design a URL shortener." What are your first two minutes?”
Weak answer
"OK, a URL shortener. So we'll need a database to store URLs, probably Postgres, and an API to create them and redirect. Let me start drawing — we'll have a load balancer, then a service, then the database, and we'll add a cache for performance…"
Strong answer
"Let me scope it first. Functional must-haves: create a short URL, redirect a short URL, and optional custom aliases — I'll treat analytics as out of scope for v1. Non-functional: redirects are the hot path, so p99 under 100 ms and 99.99% availability; creation can be slower, 500 ms is fine. Reads dominate writes by maybe 100:1 — this is a read-heavy, latency-sensitive system, which tells me it's a caching problem, not a sharding problem. Scale: I'll assume 100M new URLs a year and 10B redirects, so ~3K redirect QPS at peak. Does that framing match what you have in mind? If so, I'll define the API and entities next, then draw the design."
Why it wins: Scopes to three features with an explicit out-of-scope, separates read vs write NFRs with numbers, draws the "read-heavy → caching" conclusion that shapes the whole design, states scale assumptions, and invites one correction — all before drawing a single box.
Meta-mistakes
How strong engineers still fail this round
These aren't knowledge gaps — they're delivery mistakes. Each one sinks candidates who could have built the system.
Drawing boxes in minute two for a system you haven't bounded. You end up building the wrong thing well. Spend the first five minutes narrowing to three features and numbered NFRs — everything downstream depends on it.
Perfecting the user table for fifteen minutes and never reaching the interesting scaling problem. Budget the phases, watch the clock, and deliberately move on: "I'll park the schema details and go to the hard part — the feed fan-out."
One shallow sentence on ten topics. Interviewers read this as memorised surface knowledge. Pick the one or two hardest problems and go three layers deep — detection, mitigation, trade-off — instead of grazing everything.
"Fast", "scalable", "a lot of traffic". Without numbers the interviewer can't tell whether you've reasoned about scale. Put a figure on the NFRs and the capacity math, and tie each design decision to one.
Interviewers are told to doubt your answers and probe trade-offs. Treating that as an attack tanks the communication score. Treat every "are you sure?" as an invitation to show your reasoning, and concede genuinely better ideas.
Thinking quietly and drawing without narration. The interviewer grades what you say, not what you think. Narrate the decision and the alternative you rejected: "I'll use a queue here rather than a synchronous call because the fan-out is slow."
The bar
What each level looks like
The same prompt is graded against a higher bar as you climb. The depth interviewers expect is proportional to the level.
Mid-level
Covers the basics correctly: scopes the problem, produces a working high-level design, and handles the obvious scale with a cache and a queue. Depth in deep dives is limited but the system works.
Senior
Moves through the basics quickly to leave time for depth. Drives the design from requirements, quantifies decisions, and goes two-plus layers deep on the hard problems with explicit trade-offs.
Staff+
Owns the ambiguity. Frames the problem space, makes and defends opinionated trade-offs, anticipates failure and evolution, and connects the design to operational and organisational reality — not just the happy path.
Revision
The framework on one card
Skim this on the morning of your interview.
- •Scope first, design second. Five minutes narrowing beats forty minutes building the wrong thing.
- •Three functional must-haves. Everything else is parked or out of scope — say so explicitly.
- •Every NFR gets a number. Latency (percentile), availability, durability, consistency, scale.
- •Estimate only the numbers that change a decision, then act on them out loud.
- •Define entities + API before boxes, so the design has a concrete target.
- •High-level design: every box maps to a requirement; trace each feature through it.
- •Spend the most time on deep dives — one or two hard problems, three layers deep, with trade-offs.
- •Wrap up: recap against requirements, name the top trade-off, say what you'd do next.
- •Narrate everything. The interviewer grades what you say, not what you think.
- •Push-back is an invitation, not an attack. Show your reasoning; concede better ideas.
Now run it on a real problem.
The framework only sticks if you apply it under interview conditions. Pick a problem and work the six phases out loud — our scored practice gives you a staged prompt and reasoning-aware feedback at each step.