intermediatedeep dive

Indexing strategies

B-tree, LSM, inverted, compound, and geo indexes tied back to access patterns.

~15 min read

The index you picked three months ago decides your query latency today — and the one you didn't create decides which queries you can't ship. Indexing is not "add indexes until it's fast"; it's a first-principles match between query shape and index structure.

Read this if your last attempt…

You've said "we'll add an index" without naming which columns or in what order
You don't know why B-tree vs LSM matters for your workload
You were asked about covering indexes and stalled
You can't explain the write-amplification cost of an index

The concept

An index is a secondary data structure that trades write cost for read cost. Every index speeds up a specific query shape and slows down every write that touches its columns. There is no free index.

Three structural shapes, each optimised for a different query shape:

Architecture diagram· Three index shapes, three workloads

B-tree for reads + range. LSM for writes. Inverted for text. Every index has a write-amplification cost that pays for the read-amplification benefit.

Which index structure for which workload

B-tree	LSM	Inverted	Bitmap	Hash (in-memory)
Read	O(log n) point + range	O(log n) + merge across runs	Term → posting list	AND / OR across low-card columns	O(1) point
Write	Random writes; page splits	Sequential append; compaction later	Batch-build at index time	Build-time; poor online updates	O(1)
Ideal for	SQL primary + secondary	Write-heavy KV / wide-col	Free text, facets, ranking	OLAP with low-card filters	Point lookups, caches
Worst at	Huge write volumes	Range scans across all runs	Primary store	Updates	Range queries

How interviewers grade this

You name the query shape (point, range, prefix, text, aggregate) before you name the index.
You pick B-tree vs LSM vs inverted based on the write:read ratio and the query shape — not out of habit.
For compound queries you specify the column order in the index and justify it by leftmost-prefix.
You explicitly consider covering indexes for the 1–2 highest-QPS queries.
You mention the write cost of the index — "every post write now does 3 index updates" — rather than pretending indexes are free.

Worked example

Scenario: an orders(id, user_id, merchant_id, status, created_at, total) table, ~500M rows, three hot queries.

The queries:

1"My orders, newest first" — WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 (buyer app, very high QPS).
2"Merchant's pending orders" — WHERE merchant_id = ? AND status = 'pending' ORDER BY created_at (seller dashboard).
3"Look up one order" — WHERE id = ? (served by the primary key).

The index set:

Query 3 is free — it rides the primary key (B-tree on id).
Query 1 → compound B-tree on (user_id, created_at DESC). Equality column first, sort column second, descending to match the ORDER BY. The query becomes a bounded prefix walk: no sort step, read 20 rows, stop.
Query 2 → compound B-tree on (merchant_id, status, created_at). Equality columns (merchant_id, status) lead, then the sort column. Because 'pending' is a small slice of any merchant's history, the index stays selective.

Covering decision: the buyer feed (query 1) also needs status and total to render each row. Promote the index to (user_id, created_at DESC) INCLUDE (status, total) so the read returns from the index alone — one I/O, no heap fetch. Worth it precisely because query 1 is the highest-QPS path; I would not pay this for a cold query.

The write cost — say it out loud: every INSERT INTO orders now maintains the primary key plus two secondary indexes = three B-tree updates per write. At 5k orders/s that is 15k index writes/s. That is the budget I am spending to make two read paths fast — acceptable here because the workload is read-heavy, but if this were a 50k/s append-only event log I would reach for an LSM store instead of stacking B-tree indexes.

What I deliberately did NOT index: status alone (low cardinality — the planner would skip it in favour of a scan), total alone (no query filters on it). Adding either would tax every write for zero read benefit.

Good vs bad answer

Interviewer probe

“You have a `posts` table with (id, user_id, created_at, body). The hot query is "latest 20 posts for user X". What indexing strategy?”

Weak answer

"I'd add an index on user_id and an index on created_at so both columns are indexed."

Strong answer

"Compound B-tree index on (user_id, created_at DESC). The query WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 becomes a prefix walk on the index — zero sort, bounded read.

If rendering the post also needs (body, title, likes_count), I'd consider a covering index: (user_id, created_at DESC) INCLUDE (title, likes_count). That lets the query return from the index alone without touching the heap — one I/O instead of two.

Write cost: every post insert updates one compound index instead of two single-column indexes, so strictly cheaper. The tradeoff is that a WHERE created_at > ? without user_id cannot use this index — but that isn't the hot query, so I don't optimise for it."

Why it wins: Names the index structure (B-tree), the column order (user_id first), the sort direction (DESC), and considers a covering variant. Justifies the order by the query shape. Calls out which queries this index does NOT help and is explicit about not over-indexing.

Interview playbook3–5 min inside the data-model / deep-dive portion of the round

When it comes up

The interviewer says "this query is slow" or "the table has 500M rows"
Right after the data model — "how do you serve this read efficiently?"
Any read-heavy path where latency is a stated requirement
A write-heavy workload where you must justify B-tree vs LSM

Order of reveal

1
1. Name the query shape. Before I name an index, let me name the query: this is a point lookup / a range scan / a prefix match / a full-text query. The shape dictates the structure.
2
2. Pick the structure. B-tree for point + range, LSM if writes dominate, inverted index for free text. I pick by the write:read ratio and the query shape, not by habit.
3
3. Compound order + justify. Compound index on (user_id, created_at DESC): equality column first, then the sort column. Leftmost-prefix means this also serves "by user_id" alone but not "by created_at" alone.
4
4. Consider covering. For the one or two hottest queries I INCLUDE the selected columns so the read never touches the heap — one I/O instead of two.
5
5. Price the writes. Each index is a write tax. I am adding two, so every insert does two extra tree updates. I cap the index count and drop anything that does not serve a hot query.

Signature phrases

“Let me name the query shape before I name the index.”

“Equality columns first, then the range, then the sort.”

“I would make that a covering index so the read never hits the heap.”

“Every index is a write tax — I am adding two, so each insert pays twice.”

“Let me name the query shape before I name the index.” — Signals you derive indexes from access patterns, not by reflex.
“Equality columns first, then the range, then the sort.” — The compound-index ordering rule that separates a senior answer from "index both columns".
“I would make that a covering index so the read never hits the heap.” — Shows you optimise the hot path to a single I/O.
“Every index is a write tax — I am adding two, so each insert pays twice.” — Proves you see indexes as a trade, not free speed.

Likely follow-ups

?“When would you NOT add an index?”Reveal

Low-cardinality columns (status, gender) where the planner prefers a scan; write-dominated tables where the amplification outweighs the read win; and any column no hot query filters on. For a selective predicate on a mostly-null column, a partial index beats a full one — smaller and always chosen.

?“How do indexes change once the table is sharded?”Reveal

Secondary indexes become local (per-shard) by default — great for queries that include the partition key, useless for queries that do not (those scatter-gather across every shard). If a non-partition-key lookup is hot, you maintain a global secondary index (a separate, differently-partitioned table kept in sync), accepting its write-time cost and eventual consistency.

?“The table is 2 TB and you need a new index. How do you add it without downtime?”Reveal

Build it online: CREATE INDEX CONCURRENTLY in Postgres (no table lock), or an online schema-change tool (gh-ost, pt-osc) in MySQL that backfills via a shadow table and a trigger/binlog tail. Build on a replica first if the primary is hot, then fail over. Never a bare CREATE INDEX on a 2 TB hot table — it takes a write lock for the duration.

Common mistakes

"Add an index" without stating the column order

An index on (user_id, created_at) serves WHERE user_id=? AND WHERE user_id=? AND created_at > ?. It does NOT serve WHERE created_at > ? on its own. Leftmost-prefix is absolute.

Indexing low-cardinality columns

An index on status when status ∈ {active, deleted} with 90% active is worse than a full scan. The query planner will ignore your index. Consider a partial index instead: CREATE INDEX ... WHERE status = 'active' — much smaller, used reliably.

Ignoring write amplification

Each index is a silent tax on every write. Five indexes on a high-write table can easily turn "insert one row" into "insert one row + update five trees". This is the #1 reason an OLTP DB falls over.

Using the search index as the primary store

Elasticsearch is an index. When it corrupts or drifts, you need a primary store to rebuild from. Dual-write from your DB into ES, with CDC as the reliable mechanism — never the other way.

"We'll use an index for uniqueness" without a DB constraint

An index that "happens to be unique" is not a constraint — concurrent writes can race through it. Use UNIQUE at the schema level so the DB enforces it under every concurrency scenario.

Practice drills

You have an index on (user_id, created_at). Does WHERE created_at > NOW() - '7 days' use it?Reveal

No — leftmost-prefix says you must filter on user_id (or equality-filter it) before created_at is usable. The planner will do a full scan or pick another index. If that query is hot, add an index on (created_at) directly, or reverse the compound order if user_id is only rarely queried alone.

A write-heavy event log at 50k events/s. Postgres B-tree or Cassandra LSM, and why?Reveal

LSM — Cassandra (or RocksDB / ScyllaDB). B-trees do in-place updates with page splits; at 50k/s that becomes random I/O plus lock contention. LSMs append sequentially to a memtable, flush to immutable SSTables, and compact in the background. Writes stay fast; reads pay a (bounded) merge cost across runs.

You have a `users` table with a nullable `deleted_at` and 99% of rows have it null. You need fast "find active users by created_at". What index?Reveal

Partial index: CREATE INDEX active_users_created ON users (created_at) WHERE deleted_at IS NULL. The index stays tiny (1% of rows), the planner always picks it for the "active users" query, and write cost is only paid on the 99% of writes that are active. A full compound index (deleted_at, created_at) would be 100× larger for the same effect.

Cheat sheet

•Match index shape to query shape: B-tree = range, LSM = writes, inverted = text.
•Compound index order: equality columns first, then range, then sort.
•Covering index = query returns from the index alone. Huge latency win on hot paths.
•Cardinality matters — low-card columns usually want a partial index instead.
•Every index taxes every write. Count them; cap them.
•Unique constraint belongs at the schema level, not at the index level.

Practice this skill

These problems exercise Indexing strategies. Try one now to apply what you just learned.

url shortener news feed

Read this if

B-tree

LSM

Inverted

Bitmap

Hash (in-memory)

Read

O(log n) point + range

O(log n) + merge across runs

Term → posting list

AND / OR across low-card columns

O(1) point

Write

Random writes; page splits

Sequential append; compaction later

Batch-build at index time

Build-time; poor online updates

O(1)

Ideal for

SQL primary + secondary

Write-heavy KV / wide-col

Free text, facets, ranking

OLAP with low-card filters

Point lookups, caches

Worst at

Huge write volumes

Range scans across all runs

Primary store

Updates

Range queries