intermediatedeep dive

Data partitioning strategies

Range, hash, directory, tenant, and time partitioning with resharding consequences.

~10 min read

Picking the right partition key is more important than picking the right database. Wrong key = hot shards, unhappy scans, and painful rebalancing — regardless of how good the database is.

Read this if your last attempt…

You picked a partition key by vibes
You can't state how a query routes to a shard
You don't have a hot-key mitigation plan
You need a refresher before designing a multi-TB system

The concept

Partitioning divides a dataset across N nodes. The partition key determines which node owns each row. Three classic strategies:

Range partitioning — partition by ordered ranges (user_id 0–999, 1000–1999, …). Natural for range scans (timestamps). Hotspot risk: all writes land on the last partition if the key is monotonic (auto-increment, now()).
Hash partitioning — hash the key, mod by N. Even distribution by default. No efficient range scans. Resharding on N change is brutal unless you use consistent hashing.
Directory / lookup — explicit mapping (user_id → shard_id) stored in a small coordination service. Flexible; adds a lookup hop. Used when access patterns are irregular (some users massive, most small).

Architecture diagram· Three partitioning strategies

Range: natural scans, monotonic hotspot. Hash: even, no scans. Directory: flexible, extra hop.

Partitioning strategies — pick by query pattern.

Strategy	Best for	Weakness
Range by key	Range scans (time windows, pagination)	Monotonic keys = hotspot on last shard
Hash (simple mod N)	Even distribution, known N	Rehash storm when N changes
Consistent hash (ring + vnodes)	Dynamic scaling	No efficient range scans
Directory / lookup	Uneven tenants, explicit control	Lookup hop + coordination service
Compound (hash + range)	Both point and range queries	More complex key design

How interviewers grade this

You state the partition key and why it matches the dominant query.
You name the strategy (range / hash / directory / consistent hash).
You identify hot-key risk and the mitigation.
You describe how resharding works when you add nodes.
You acknowledge queries that don't use the partition key need fan-out.

Variants

Consistent hash ring

Hash both keys and nodes onto a ring; each key goes to the next node clockwise.

Modern default for caches and wide-column stores. Virtual nodes smooth the distribution. Adding a node moves only ~1/N of the keys instead of reshuffling everything.

Pros

+Trivial scaling up/down
+Even distribution with vnodes
+Industry standard (Cassandra, Redis, DynamoDB)

Cons

−No range scans
−Does not solve hot-key problem

Choose this variant when

Cache clusters
Wide-column stores
When N will change over time

Range partitioning

Partitions own contiguous ranges of the key; routing is by range lookup.

Great for time-series and pagination. Split a hot range in two when it grows. Be careful: auto-increment ids or timestamps make the newest partition a write hotspot — bucket with a hash prefix if needed.

Pros

+Natural range scans
+Incremental rebalancing via range splits
+Good for time-series

Cons

−Monotonic-key hotspot
−Router must track range bounds

Choose this variant when

Time-series
Lexicographically-queried keys
Workloads dominated by range scans

Directory / lookup

Explicit (key → shard) map in a coordination service.

For uneven tenants or workloads where strategy-based routing can't express the required placement. Move a specific big tenant to its own shard without rehashing the world.

Pros

+Maximum flexibility
+Easy per-tenant isolation
+Good for multi-tenancy with skewed tenants

Cons

−Directory is on the hot path — cache aggressively
−Directory itself must be HA

Choose this variant when

Multi-tenant SaaS with big + small tenants
When predictable placement matters more than simplicity

Worked example

Design: partitioning for a messaging app, 100M users, skewed activity.

Dominant query: "recent messages for user X" — strongly user-scoped.

Choice: hash partition by user_id. 256 shards to start; consistent-hash ring for growth.

Hot key handling: celebrity users (million-follower accounts) produce disproportionate fan-out on write (message delivered to followers). Mitigation:

Outbound delivery fanned through Kafka partitioned by recipient, so delivery load distributes even if one sender is hot.
Inbox reads for celebrities cached aggressively at CDN.
Outbox for celebrities stored in a separate shard with dedicated capacity.

Resharding: Each shard ~300k users at steady state. When a shard exceeds 500k, we split using consistent-hash ring ops: add 2 new virtual nodes, migrate the affected key ranges with a dual-write window, verify, cut over. Takes hours per shard; done behind the scenes.

Query patterns that DON'T match:

"all messages sent in the last 5 minutes across everyone" = fan-out to all 256 shards. Expensive; we don't support it from the primary path. Instead, a separate analytics pipeline (CDC → ClickHouse) handles time-windowed global queries.

Good vs bad answer

Interviewer probe

“How do you shard your messaging data?”

Weak answer

"Hash partition by message_id. Even distribution."

Strong answer

"Hash by user_id, not message_id. The dominant query is 'recent messages for user X' — if I partition by message_id, every read fans out to all shards. user_id keeps reads shard-local. Consistent-hash ring with virtual nodes for easy scaling. Celebrity hot keys mitigated by (1) aggressive CDN caching of their inboxes, (2) Kafka fan-out for delivery so recipients distribute the write load. Global-time queries (analytics) go through a separate CDC-fed warehouse, not the primary shards. Resharding is a ring op + dual-write migration; done per shard when a shard exceeds ~500k users."

Why it wins: Matches key to dominant query, names hot-key mitigation, explains scaling, acknowledges the queries that don't fit.

Interview playbook3 min when the data outgrows a single node

When it comes up

The dataset is multi-TB and will not fit on one node
The interviewer asks "how do you shard this?"
Write or storage volume exceeds a single primary
A hot tenant / celebrity / viral item threatens one shard

Order of reveal

1
1. Key matches the dominant query. I pick the partition key that matches the most common access pattern, so the hot query stays on one shard instead of fanning out to all of them.
2
2. Strategy. Hash for even distribution, range for scan-heavy workloads, directory when tenants are wildly uneven.
3
3. Consistent hashing. For elastic membership I use a consistent-hash ring with virtual nodes so adding a node moves ~1/N of keys, not all of them.
4
4. Hot-key plan. I name the hot-key risk up front and the mitigation: cache it, split it with a suffix, or isolate it onto its own shard.
5
5. Cross-partition queries. Queries that do not include the partition key scatter-gather; I push those to an analytics replica rather than the primary shards.

Signature phrases

“The partition key matters more than the database.”

“Pick the key that matches the dominant query, or every read fans out.”

“Consistent hashing so adding a node moves 1/N of keys, not all of them.”

“Name the hot key before launch — one celebrity takes down a shard.”

“The partition key matters more than the database.” — Reframes sharding as an access-pattern decision, which is the senior lens.
“Pick the key that matches the dominant query, or every read fans out.” — Ties the key choice directly to latency and cost.
“Consistent hashing so adding a node moves 1/N of keys, not all of them.” — Shows you know why naive mod-N is a trap.
“Name the hot key before launch — one celebrity takes down a shard.” — Demonstrates you anticipate skew instead of discovering it in prod.

Likely follow-ups

?“A query does not include the partition key. How is it served?”Reveal

It scatter-gathers: the router fans the query to every shard, each returns partial results, and a coordinator merges them — O(N shards) per query, fine occasionally but not on the hot path. If such a query is frequent, I maintain a global secondary index (a separate table partitioned by that other key) or serve it from an analytics replica fed by CDC, so the primary shards stay dedicated to their access pattern.

?“Walk me through adding a shard to a consistent-hash ring.”Reveal

The new node is hashed onto the ring at its virtual-node positions. Only the keys falling between each new vnode and its predecessor move — about 1/N of the total — and they migrate from exactly one existing node per range. During migration I dual-read (check new owner, fall back to old) until the copy completes, then cut over and drop the old copies. Contrast with mod-N, where changing N remaps nearly every key at once and cold-starts the whole cache.

?“Range or hash partitioning for a time-series workload?”Reveal

Range by time enables the killer query — efficient scans over a time window — but a monotonic key (now()) sends every write to the newest partition, a hotspot. The usual compromise is a compound key that buckets: partition by (date, hash(entity) % 16) so writes spread across 16 buckets per day while still allowing day-ranged scans. Pure hash kills the range scan; pure range hotspots the tail — bucketing balances both.

Common mistakes

Partition key that doesn't match dominant query

Reads fan out to all shards; latency + cost explode. The key must align with your common access pattern.

Monotonic key with range partitioning

Auto-increment ids or timestamps write to the last shard only. Bucket with a hash prefix or use hash partitioning.

Simple mod N

Adding a node rehashes every key. Use consistent hashing.

No hot-key planAdvanced

One celebrity, one viral product, one power user will bring down a shard. Identify hot keys up-front and have a mitigation.

Practice drills

Why isn't simple `hash(key) mod N` enough?Reveal

Adding a node changes N → nearly every key remaps → massive data movement. Consistent hashing moves only ~1/N of keys when one is added. For a stable-N cache, mod N is fine. For any system where N will grow, use consistent hashing.

Interviewer: "what if one user has 10× the data of average?"Reveal

Hash partitioning spreads that one user's rows across one shard — they become a skew. Options: (1) shard their data at a finer granularity (by (user_id, object_type) or (user_id, time_bucket)); (2) move that user to a dedicated shard via directory partitioning; (3) cache their hot reads so the shard isn't reads-bound; (4) if the user is genuinely 100×, it's a product decision — a separate tier.

You're at the whiteboard. Shard count: 16 or 256?Reveal

Err higher. Resharding is the painful operation; you can run 256 shards on 8 physical nodes (multi-tenant the shards) initially, then scale nodes as shards grow. Going 16 → 256 later is 16× more painful than starting at 256 and growing from 8 physical nodes to 256 over time.

Cheat sheet

•Partition key = the axis of your dominant query.
•Hash for even distribution; range for scans; directory for uneven tenants.
•Consistent hashing for dynamic N.
•Hot keys: cache, split, or isolate. Plan before launch.
•Cross-shard queries exist but are expensive — offload to analytics pipeline.
•Resharding is an operational migration — plan the runbook.

Practice this skill

These problems exercise Data partitioning strategies. Try one now to apply what you just learned.

uber dispatch

Read this if

Strategy

Best for

Weakness

Range by key

Range scans (time windows, pagination)

Monotonic keys = hotspot on last shard

Hash (simple mod N)

Even distribution, known N

Rehash storm when N changes

Consistent hash (ring + vnodes)

Dynamic scaling

No efficient range scans

Directory / lookup

Uneven tenants, explicit control

Lookup hop + coordination service

Compound (hash + range)

Both point and range queries

More complex key design