Indexing strategies
B-tree, LSM, inverted, compound, and geo indexes tied back to access patterns.
The index you picked three months ago decides your query latency today — and the one you didn't create decides which queries you can't ship. Indexing is not "add indexes until it's fast"; it's a first-principles match between query shape and index structure.
Read this if your last attempt…
- You've said "we'll add an index" without naming which columns or in what order
- You don't know why B-tree vs LSM matters for your workload
- You were asked about covering indexes and stalled
- You can't explain the write-amplification cost of an index
The concept
An index is a secondary data structure that trades write cost for read cost. Every index speeds up a specific query shape and slows down every write that touches its columns. There is no free index.
Three structural shapes, each optimised for a different query shape:
B-tree for reads + range. LSM for writes. Inverted for text. Every index has a write-amplification cost that pays for the read-amplification benefit.
Which index structure for which workload
| B-tree | LSM | Inverted | Bitmap | Hash (in-memory) | |
|---|---|---|---|---|---|
| Read | O(log n) point + range | O(log n) + merge across runs | Term → posting list | AND / OR across low-card columns | O(1) point |
| Write | Random writes; page splits | Sequential append; compaction later | Batch-build at index time | Build-time; poor online updates | O(1) |
| Ideal for | SQL primary + secondary | Write-heavy KV / wide-col | Free text, facets, ranking | OLAP with low-card filters | Point lookups, caches |
| Worst at | Huge write volumes | Range scans across all runs | Primary store | Updates | Range queries |
How interviewers grade this
- You name the query shape (point, range, prefix, text, aggregate) before you name the index.
- You pick B-tree vs LSM vs inverted based on the write:read ratio and the query shape — not out of habit.
- For compound queries you specify the column order in the index and justify it by leftmost-prefix.
- You explicitly consider covering indexes for the 1–2 highest-QPS queries.
- You mention the write cost of the index — "every post write now does 3 index updates" — rather than pretending indexes are free.
Worked example
Scenario: an orders(id, user_id, merchant_id, status, created_at, total) table, ~500M rows, three hot queries.
The queries:
- 1"My orders, newest first" —
WHERE user_id = ? ORDER BY created_at DESC LIMIT 20(buyer app, very high QPS). - 2"Merchant's pending orders" —
WHERE merchant_id = ? AND status = 'pending' ORDER BY created_at(seller dashboard). - 3"Look up one order" —
WHERE id = ?(served by the primary key).
The index set:
- Query 3 is free — it rides the primary key (B-tree on
id). - Query 1 → compound B-tree on
(user_id, created_at DESC). Equality column first, sort column second, descending to match the ORDER BY. The query becomes a bounded prefix walk: no sort step, read 20 rows, stop. - Query 2 → compound B-tree on
(merchant_id, status, created_at). Equality columns (merchant_id, status) lead, then the sort column. Because 'pending' is a small slice of any merchant's history, the index stays selective.
Covering decision: the buyer feed (query 1) also needs status and total to render each row. Promote the index to (user_id, created_at DESC) INCLUDE (status, total) so the read returns from the index alone — one I/O, no heap fetch. Worth it precisely because query 1 is the highest-QPS path; I would not pay this for a cold query.
The write cost — say it out loud: every INSERT INTO orders now maintains the primary key plus two secondary indexes = three B-tree updates per write. At 5k orders/s that is 15k index writes/s. That is the budget I am spending to make two read paths fast — acceptable here because the workload is read-heavy, but if this were a 50k/s append-only event log I would reach for an LSM store instead of stacking B-tree indexes.
What I deliberately did NOT index: status alone (low cardinality — the planner would skip it in favour of a scan), total alone (no query filters on it). Adding either would tax every write for zero read benefit.
Good vs bad answer
Interviewer probe
“You have a `posts` table with (id, user_id, created_at, body). The hot query is "latest 20 posts for user X". What indexing strategy?”
Weak answer
"I'd add an index on user_id and an index on created_at so both columns are indexed."
Strong answer
"Compound B-tree index on (user_id, created_at DESC). The query WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 becomes a prefix walk on the index — zero sort, bounded read.
If rendering the post also needs (body, title, likes_count), I'd consider a covering index: (user_id, created_at DESC) INCLUDE (title, likes_count). That lets the query return from the index alone without touching the heap — one I/O instead of two.
Write cost: every post insert updates one compound index instead of two single-column indexes, so strictly cheaper. The tradeoff is that a WHERE created_at > ? without user_id cannot use this index — but that isn't the hot query, so I don't optimise for it."
Why it wins: Names the index structure (B-tree), the column order (user_id first), the sort direction (DESC), and considers a covering variant. Justifies the order by the query shape. Calls out which queries this index does NOT help and is explicit about not over-indexing.
When it comes up
- The interviewer says "this query is slow" or "the table has 500M rows"
- Right after the data model — "how do you serve this read efficiently?"
- Any read-heavy path where latency is a stated requirement
- A write-heavy workload where you must justify B-tree vs LSM
Order of reveal
- 11. Name the query shape. Before I name an index, let me name the query: this is a point lookup / a range scan / a prefix match / a full-text query. The shape dictates the structure.
- 22. Pick the structure. B-tree for point + range, LSM if writes dominate, inverted index for free text. I pick by the write:read ratio and the query shape, not by habit.
- 33. Compound order + justify. Compound index on (user_id, created_at DESC): equality column first, then the sort column. Leftmost-prefix means this also serves "by user_id" alone but not "by created_at" alone.
- 44. Consider covering. For the one or two hottest queries I INCLUDE the selected columns so the read never touches the heap — one I/O instead of two.
- 55. Price the writes. Each index is a write tax. I am adding two, so every insert does two extra tree updates. I cap the index count and drop anything that does not serve a hot query.
Signature phrases
- “Let me name the query shape before I name the index.” — Signals you derive indexes from access patterns, not by reflex.
- “Equality columns first, then the range, then the sort.” — The compound-index ordering rule that separates a senior answer from "index both columns".
- “I would make that a covering index so the read never hits the heap.” — Shows you optimise the hot path to a single I/O.
- “Every index is a write tax — I am adding two, so each insert pays twice.” — Proves you see indexes as a trade, not free speed.
Likely follow-ups
?“When would you NOT add an index?”Reveal
Low-cardinality columns (status, gender) where the planner prefers a scan; write-dominated tables where the amplification outweighs the read win; and any column no hot query filters on. For a selective predicate on a mostly-null column, a partial index beats a full one — smaller and always chosen.
?“How do indexes change once the table is sharded?”Reveal
Secondary indexes become local (per-shard) by default — great for queries that include the partition key, useless for queries that do not (those scatter-gather across every shard). If a non-partition-key lookup is hot, you maintain a global secondary index (a separate, differently-partitioned table kept in sync), accepting its write-time cost and eventual consistency.
?“The table is 2 TB and you need a new index. How do you add it without downtime?”Reveal
Build it online: CREATE INDEX CONCURRENTLY in Postgres (no table lock), or an online schema-change tool (gh-ost, pt-osc) in MySQL that backfills via a shadow table and a trigger/binlog tail. Build on a replica first if the primary is hot, then fail over. Never a bare CREATE INDEX on a 2 TB hot table — it takes a write lock for the duration.
Common mistakes
An index on (user_id, created_at) serves WHERE user_id=? AND WHERE user_id=? AND created_at > ?. It does NOT serve WHERE created_at > ? on its own. Leftmost-prefix is absolute.
An index on status when status ∈ {active, deleted} with 90% active is worse than a full scan. The query planner will ignore your index. Consider a partial index instead: CREATE INDEX ... WHERE status = 'active' — much smaller, used reliably.
Each index is a silent tax on every write. Five indexes on a high-write table can easily turn "insert one row" into "insert one row + update five trees". This is the #1 reason an OLTP DB falls over.
Elasticsearch is an index. When it corrupts or drifts, you need a primary store to rebuild from. Dual-write from your DB into ES, with CDC as the reliable mechanism — never the other way.
An index that "happens to be unique" is not a constraint — concurrent writes can race through it. Use UNIQUE at the schema level so the DB enforces it under every concurrency scenario.
Practice drills
You have an index on (user_id, created_at). Does WHERE created_at > NOW() - '7 days' use it?Reveal
No — leftmost-prefix says you must filter on user_id (or equality-filter it) before created_at is usable. The planner will do a full scan or pick another index. If that query is hot, add an index on (created_at) directly, or reverse the compound order if user_id is only rarely queried alone.
A write-heavy event log at 50k events/s. Postgres B-tree or Cassandra LSM, and why?Reveal
LSM — Cassandra (or RocksDB / ScyllaDB). B-trees do in-place updates with page splits; at 50k/s that becomes random I/O plus lock contention. LSMs append sequentially to a memtable, flush to immutable SSTables, and compact in the background. Writes stay fast; reads pay a (bounded) merge cost across runs.
You have a `users` table with a nullable `deleted_at` and 99% of rows have it null. You need fast "find active users by created_at". What index?Reveal
Partial index: CREATE INDEX active_users_created ON users (created_at) WHERE deleted_at IS NULL. The index stays tiny (1% of rows), the planner always picks it for the "active users" query, and write cost is only paid on the 99% of writes that are active. A full compound index (deleted_at, created_at) would be 100× larger for the same effect.
Cheat sheet
- •Match index shape to query shape: B-tree = range, LSM = writes, inverted = text.
- •Compound index order: equality columns first, then range, then sort.
- •Covering index = query returns from the index alone. Huge latency win on hot paths.
- •Cardinality matters — low-card columns usually want a partial index instead.
- •Every index taxes every write. Count them; cap them.
- •Unique constraint belongs at the schema level, not at the index level.
Practice this skill
These problems exercise Indexing strategies. Try one now to apply what you just learned.
Read this if