Loading…
Loading…
Practice catalog
Every problem ships with a checklist of what strong answers cover, a stage-by-stage workspace, and a debrief that points you to the exact concept to study next.
Design a URL shortening service like Bit.ly that takes long URLs and creates short, unique aliases. Users should be able to create short URLs, be redirected when visiting them, and optionally track click analytics. The system should handle millions of URLs and redirect requests with minimal latency.
Design a rate limiting system that controls the rate of requests a client can send to an API. The rate limiter should support different rate limiting algorithms, work in a distributed environment, and provide clear feedback to clients when they are throttled. Consider how this would work as both a standalone service and as middleware.
Design a real-time chat application like WhatsApp or Messenger that supports 1:1 and group messaging (up to 100 participants). Users should be able to send text and media messages, receive messages in real time when online, and retrieve undelivered messages when they come back online. The system must handle billions of users with low-latency delivery and guaranteed message durability.
Design a social media news feed system like Facebook's News Feed or Twitter's Home Timeline. When a user opens the app, they see a personalized feed of posts from people and pages they follow, ranked by relevance. The system must handle billions of users, hundreds of millions of posts per day, and deliver a fresh feed within seconds. The key design decision is how to generate the feed: pre-compute it on write (fan-out-on-write) or assemble it on read (fan-out-on-read).
Design a notification system that can send notifications to users across multiple channels — push notifications (iOS/Android), SMS, and email. The system must handle billions of notifications per day, support user preferences and opt-outs, de-duplicate notifications, and ensure reliable delivery with prioritization. Upstream services (e.g., orders, social, marketing) publish notification events; the notification service handles routing, templating, channel selection, and delivery.
Design a search autocomplete (typeahead) system like Google Search suggestions. As a user types each character, the system returns the top 5-10 most relevant search suggestions within 100ms. The system must handle billions of queries per day, rank suggestions by popularity and personalization, and update its suggestion corpus as new queries emerge.
Design a ticket booking system like Ticketmaster or BookMyShow for live events (concerts, sports). Users browse events, select seats from a venue map, and complete payment — all while thousands of other users compete for the same seats. The system must prevent double-booking (two users paying for the same seat), handle massive traffic surges when popular events go on sale, and provide a fair booking experience under extreme contention.
Design the inference layer for a B2B platform whose customers embed LLM features in their apps. Each tenant sends millions of prompt requests per day across multiple foundation-model providers. The gateway must route by tenant policy, cache responses safely, fall back across providers on outage, enforce per-tenant cost and rate budgets, stream tokens end-to-end, and keep an auditable log of every call. Cost runaway, provider outages, and noisy-neighbor tenants are the day-1 production failure modes.
Design the webhook layer for a B2B platform (think Stripe events, GitHub webhooks). When something happens internally (charge succeeded, PR opened), an event must be delivered to every customer endpoint subscribed to that event type — durably, signed, and with retries on failure. Customers get a dashboard to inspect delivery attempts and replay failed events. Endpoints fail constantly: timeouts, 5xx, slow consumers, customers down for hours. The hard problems are slow-customer isolation, retry-storm protection, and dashboard replay without overwhelming a recovering customer.
Design a platform where users solve programming problems, submit code, run public samples, get judged against hidden tests, join timed contests, and see rankings. The hard parts are sandboxed execution, queue isolation across languages and tenants, hidden test-case secrecy, contest traffic spikes, leaderboard freshness, plagiarism signals, and trustworthy judging results.
Design a community platform where makers launch products, users vote and comment, and each day has a ranked launch feed. The hard parts are ranking freshness, vote manipulation, launch-day traffic spikes, moderation, notifications, search/discovery, and making ranks rebuildable when scoring rules change.
Design a pastebin service where users submit a blob of text or code and get back a short, shareable URL. Pastes can be public or private and may carry an optional expiration. Reads dominate writes ~100:1, payloads range from a few kilobytes to several megabytes, and the system must bound its own storage growth — which makes tiered storage, retention, and unguessable IDs the decisions that separate a passing answer from a strong one.
Design a feature-flag and rollout-control platform like LaunchDarkly. Applications ask "is this feature on for this user?" millions of times a second, so the answer has to be instant — which is the whole twist: evaluation happens locally inside the SDK, not as a network call. The hard system-design surface is the control plane: distributing rule changes to ~100M long-lived SDK connections within seconds, bucketing percentage rollouts deterministically so a user never flickers, and staying available for customer apps even when your own service is down.
Design the core money ledger behind a payments platform. Every customer, merchant, fee pool, and FX reserve is an account; money moves between them as transfers. The catch is that this is money — you can never lose a cent, never create one, never double-post a retried request, and you must be able to prove years later exactly how any balance came to be. The naive "UPDATE accounts SET balance = balance - amount" is the trap; the real design is an immutable double-entry journal where balances are derived, postings are atomic and idempotent, and the books always reconcile to zero.
Design the editing backbone of a Google Docs–style product: many people open the same document and type at the same time. Each editor sees their own keystrokes instantly and everyone else’s within a fraction of a second, with live cursors and presence. The defining challenge is convergence — concurrent edits must never be silently lost, and every client must end up with an identical document. That forces a real conflict-resolution model (OT or CRDT), a server-assigned total order per document, and an append-only operation log with periodic snapshots, rather than last-write-wins on a document blob.
Design the moderation backbone of a large UGC platform: ~500M pieces of content a day — images, video, and text — must be screened for policy violations (CSAM, nudity, violence, hate, spam) before causing harm. The defining challenge is that you can neither run every model synchronously on upload (it would add seconds of latency and collapse under peak) nor have humans review everything (15K reviewers can touch ~1–2% of the firehose). The right shape is a multi-stage pipeline: a fast synchronous known-bad hash gate that hard-blocks the worst content before publish, an asynchronous ML classifier fan-out over optimistically-published content that takes things down when flagged, and a prioritized human-review queue ranked by severity × reach.
Design a Retrieval-Augmented Generation system that answers natural-language questions over a large private corpus. The system ingests and indexes documents, retrieves the most relevant passages for each question, and has an LLM generate a grounded, cited answer. It must stay fresh as documents change, keep answers faithful to the corpus, control cost, and isolate thousands of tenants — at 50M documents, 500M vectors, and 5K queries per second.
Design the home timeline for a social network like Twitter/X. Users follow hundreds of accounts and, on opening the app, should instantly see a fresh feed of those accounts' recent posts. The hard part is fan-out: a post from an account with tens of millions of followers must reach all of their timelines without melting the system or slowing everyone else down. Scale: 500M DAU, 2B users, 400M tweets/day, sub-200ms timeline loads.
Design the dispatch core of a ride-hailing service. Drivers stream GPS continuously; a rider requests a ride and the system finds nearby available drivers, matches one, and tracks the trip. The hard parts are the firehose of location updates (~1M+/sec) and the geospatial query — given a point, find the closest available drivers in milliseconds — at city scale, in dense areas, without ever handing the same driver to two riders.
Design the video platform behind YouTube. A creator uploads a large, possibly flaky video file; the system ingests it reliably, transcodes it into multiple adaptive-bitrate renditions, and streams it smoothly to billions of viewers worldwide. The two hard parts are the offline transcoding pipeline and serving an enormous read volume of video bytes cheaply — overwhelmingly a CDN and storage problem. Scale: 2B users, ~500 hours uploaded/min, ~1B watch-hours/day.
Design a distributed job scheduler (cron-as-a-service). Users submit one-time and recurring jobs that must fire close to their scheduled time and execute via a worker fleet — without being lost, without silently double-running, and surviving worker and scheduler crashes. The hard parts are finding which jobs are due efficiently at scale and guaranteeing once-ish execution despite failures and round-time spikes when millions of jobs are all scheduled for the top of the hour. Scale: 100M+ jobs, ~10K submissions/sec.
Design the analytics backbone for an ad network. Every ad click generates an event; the system must ingest these at enormous volume, aggregate them by ad over time windows, and serve the results both in near-real-time for dashboards and accurately for billing. The hard parts are the write firehose (millions of events/sec), getting counts right despite duplicate and out-of-order events, and answering aggregation queries fast without scanning raw events. Scale: billions of clicks/day, ~1M events/sec.
Design a distributed in-memory key-value store / cache (like Redis Cluster, Memcached, or DynamoDB's storage layer) that partitions data across many nodes so the dataset exceeds one machine's memory, replicates each partition for availability, and stays fast (single-digit-millisecond) while surviving node failures and rebalancing as the cluster grows. The hard parts are choosing a partitioning scheme that doesn't reshuffle the world when a node joins, a replication + consistency model that trades off latency vs durability, and handling hot keys that overwhelm a single node.