Data-intensive systems
Storage choice, partitioning, replication, indexing. Everything it takes to make data layer decisions that survive review.
For: Engineers prepping for data-heavy prompts (search, analytics, feeds, storage products)
After this path
Name the right store, the right partition key, the right indexes, and the right replication mode — and defend each.
- 1Skill
Data model design
Entities, relationships, keys, normalization vs denormalization.
Why this, here: The entity model decides every downstream choice.
- 2Skill
Storage choice justification
Picking SQL vs KV vs doc vs blob vs timeseries based on access patterns.
Why this, here: SQL vs KV vs wide-column vs search — the framework for picking.
- 3Skill
Sharding & partitioning
Partition key selection, hot spots, rebalancing, consistent hashing.
Why this, here: The single most consequential call in distributed data.
Checkpoint
Defend a partition key for a multi-tenant SaaS’s events table. Now describe the hot partition that eventually appears and how you’d reshard without downtime. If the reshard story is blank, the call wasn’t load-bearing yet.
- 4Deep dive
Indexing strategies
B-tree, LSM, inverted, compound, and geo indexes tied back to access patterns.
Why this, here: B-tree vs LSM vs inverted. First-principles choice.
- 5Skill
Replication & durability
Leader/follower, sync vs async replication, write quorum, RPO/RTO.
Why this, here: Quorum, sync vs async, RPO / RTO.
Checkpoint
For a payments store: sync replication to N replicas or async to 1? State the RPO you accept and the failure that forces the trade-off. Staff-plus candidates name the number.
- 6Deep dive
CDC and eventing
Outbox, change-data-capture, derived views, dual-write avoidance, and replay safety.
Why this, here: Keep derived stores in sync without dual writes.
- 7Pattern
Search over content
Inverted index + ranking service. The hard part isn't indexing — it's relevance, freshness, and a rebuild path.
Why this, here: The canonical derived-view pattern.