Loading…
Loading…
Design a distributed in-memory key-value store / cache (like Redis Cluster, Memcached, or DynamoDB's storage layer) that partitions data across many nodes so the dataset exceeds one machine's memory, replicates each partition for availability, and stays fast (single-digit-millisecond) while surviving node failures and rebalancing as the cluster grows. The hard parts are choosing a partitioning scheme that doesn't reshuffle the world when a node joins, a replication + consistency model that trades off latency vs durability, and handling hot keys that overwhelm a single node.
Best after a few full reps. Expect follow-up questions, edge cases, and deeper trade-off discussion.
5 stages
45 min
Grade anytime
Workspace-first, hints visible, stage retry available. The cheap, repeatable loop — build the answer shape before you take it under pressure.
Solve once, compare against the checklist, then come back to the weak stage instead of starting over.
Strict timer, hints hidden, debrief deferred to the end. Use this once you can already structure a clean answer and want to pressure-test pacing and pushback.
Best after one structured rep · timed · focused on pacing and communication.
This is the framing pass. A strong answer quickly defines what the system must do, what quality bar it has to hit, and the numbers that will justify the rest of the design.
What must exist
What good looks like
What to cover
Numbers to anchor the design
Each stage has a distinct job. Treat them like separate deliverables instead of one giant answer, and the round becomes much easier to navigate.
Define the contract clearly: the endpoints, auth boundary, error semantics, and the one or two decisions that matter most.
What you should produce
Define the API. What do GET / PUT / DELETE look like, and how does a client reach the right node?
Strong answers cover
Lay out the main components and trace the write path, read path, and any async path cleanly.
What you should walk through
Walk me through the architecture.
Strong answers cover
Pick the store, show the schema or key model, and explain why that storage choice fits the access pattern.
What you should lock in
What lives on each node? How do you expire keys, evict under memory pressure, and persist for durable mode?
Strong answers cover
Name the first bottleneck, failure modes, and the trade-offs that keep the system fast and reliable under pressure.
What you should pressure-test
Let's deep-dive the hard parts: a hot key getting 100K req/sec, rebalancing when a node joins, and what happens when a node dies.
Strong answers cover
Translate the prompt into concrete requirements, scale, and trade-offs before drawing architecture.
Give APIs in the API stage, data models in the storage stage, and failure modes in scaling. Don't blur them together.
Grade early, compare to the reference reasoning criteria, fix the biggest misses, and re-submit the weak stage instead of starting over.
Related topics