intermediate15m

Architecture

Protocol choice

REST for humans, gRPC for services, GraphQL for views, async for anything that takes more than a second. Most candidates default to "REST everywhere" — that's fine until it isn't, and the interviewer will find the seam.

Open lesson

intermediate10m

Architecture

Real-time transport choices

Polling, long-polling, SSE, WebSocket — the transport choice is about connection count and direction, not cleverness. Pick wrong and you pay in money or in user-visible lag.

Open lesson

core15m

API design

Idempotency

Every retry is a test of your idempotency design. Networks drop, clients retry, at-least-once queues redeliver — if your write path can't absorb a duplicate without double-charging or double-sending, you'll discover it at 2 AM on a Sunday.

Open lesson

intermediate15m

Data & storage

Indexing strategies

The index you picked three months ago decides your query latency today — and the one you didn't create decides which queries you can't ship. Indexing is not "add indexes until it's fast"; it's a first-principles match between query shape and index structure.

Open lesson

intermediate10m

Data & storage

Search indexing

Full-text search is an inverted index plus relevance; the hard part is relevance. LIKE '%foo%' on a SQL table is not search — it's a table scan waiting to die.

Open lesson

intermediate10m

Data & storage

Large blob handling

Files above a few MB should never touch your app server. Every byte that flows through your app is bandwidth you pay for, memory you stress, and latency you inflict.

Open lesson

intermediate10m

Data & storage

Data partitioning strategies

Picking the right partition key is more important than picking the right database. Wrong key = hot shards, unhappy scans, and painful rebalancing — regardless of how good the database is.

Open lesson

Proadvanced10m

Data & storage

CDC and eventing

Change-data-capture is how you keep read models fresh without dual writes. The DB's log is already your most reliable event stream — use it.

Open lesson

Proadvanced10m

Data & storage

Geospatial indexing

Rectangular lat/lng queries on a plain B-tree die at a few thousand rows. "Find all riders within 5 km" is not a B-tree query — it needs a spatial index.

Open lesson

intermediate10m

Scalability

CDN and edge caching

The best cache is the one that never hits your origin. A CDN turns "my origin is overloaded" into "my origin is bored" — if you use it right.

Open lesson

intermediate10m

Reliability

Delivery semantics

Exactly-once is a lie you tell clients; at-least-once + idempotent consumers is the truth. Even Kafka's "exactly-once" is exactly-once *within Kafka* — not end to end.

Open lesson

Proadvanced10m

Reliability

Consensus and leader election

Raft and Paxos aren't trivia — they're the reason your leader-election design either works or deadlocks. Most interview failures here are: "we'll elect a leader somehow" with no quorum story and no fencing.

Open lesson

Proadvanced15m

Reliability

Active-active multi-region

Active-active is not just active-passive with two active sides — it's a whole conflict-resolution design. You either pick partitioning, last-write-wins, or CRDTs. There is no fourth option.

Open lesson

intermediate10m

Performance

Backpressure and queueing

Little's Law isn't optional — it's why your queue grows without bound when the producer outruns the consumer. Unbounded queues are a bug, not a feature.

Open lesson

core10m

Security & abuse

Auth and sessions

Session vs JWT is not a religion; it's a trade-off between revocation and statelessness. Most systems need both: short JWT access tokens for speed, long refresh tokens with server state for revocation.

Open lesson

The Delivery Framework — how to run the 45 minutes

New to system design

Interviewing next week

Weak on data modelling

Patterns first

Search when you know the gap.

API & requirements cleanup

Requirements & scope framing

Capacity estimation (back-of-envelope)

Numbers to know

High-level architecture

Networking fundamentals

Async messaging & queues

API contract design

Data model design

Storage choice justification

Sharding & partitioning

Caching strategy

Load balancing & traffic routing

Consistent hashing

Failure mode analysis

Replication & durability

Observability & operations

Consistency trade-offs

Latency budgeting

Abuse prevention & rate limiting

Protocol choice

Real-time transport choices

Idempotency

Indexing strategies

Search indexing

Large blob handling

Data partitioning strategies

CDC and eventing

Geospatial indexing

CDN and edge caching

Delivery semantics

Consensus and leader election

Active-active multi-region

Backpressure and queueing

Auth and sessions

Read-heavy

Write-heavy

Classic request-response

Long-running tasks

Producer-consumer / work queue

Event-driven / saga

Real-time delivery architecture

Fan-out: on write vs on read

Edge caching / CDN-first

Multi-region active-passive / active-active

Search over content

Geospatial / proximity lookup

Large file upload & blob handling

Content recommendation

Rate limiting / quota enforcement

High Availability

Leader election / consensus

PostgreSQL

DynamoDB

Apache Cassandra

Redis

Elasticsearch

Vector Database

Apache Kafka

Apache Flink

S3 / Blob Storage

API Gateway

Load Balancer

CDN (Content Delivery Network)

ZooKeeper / etcd

Message Queue (SQS / RabbitMQ)

New to system design

Interviewing next week

Weak on data modelling

Patterns first

Weak on reliability

Weak on trade-offs

API & requirements cleanup

Senior → staff calibration