advanceddeep dive

Geospatial indexing

Geohash, S2/H3 cells, radius search, proximity ranking, and hotspot management.

~10 min read

Rectangular lat/lng queries on a plain B-tree die at a few thousand rows. "Find all riders within 5 km" is not a B-tree query — it needs a spatial index.

Read this if your last attempt…

You have a "find nearby" feature
You picked PostGIS without knowing why
You can't explain geohash vs quadtree vs S2
You're designing Uber, DoorDash, or anything map-based

The concept

The problem: efficiently find all points within a radius (or a bounding box) of a target point. A B-tree on latitude alone gives you a latitude stripe; intersecting with a longitude stripe gives you a rectangle, but two stripes fetched separately + intersected is O(N) per dimension and doesn't scale.

Spatial indexing strategies:

Architecture diagram· Geohash: encode (lat,lng) → prefix; prefix length = cell size

Longer prefix → smaller cell. Nearby points often share a prefix; edge-cases solved by querying 9 adjacent cells.

Geospatial indexing — pick by use case.

Tool	Strength	Typical use
PostGIS (R-tree)	Polygons + points, SQL native	General GIS, maps, boundaries
MongoDB 2dsphere	Simple API, JSON-native	Most app-level geo
Redis GEO (geohash)	In-memory, very fast	Hot-path proximity queries, ride dispatch
S2 (Google)	Global uniform cells	Foursquare, Uber, planet-scale
H3 (Uber)	Hexagonal cells	Ride-sharing, uniform neighbour distance
Elasticsearch geo	Geo + text combined	Search with location filtering

How interviewers grade this

You identify the spatial access pattern (radius, bbox, kNN).
You pick the index (PostGIS for general GIS, S2/H3 for large-scale, Redis geo for cache tier).
You address the edge-cell problem for geohash.
You estimate cell counts / storage.
You name sharding (by geohash prefix) when the dataset is huge.

Variants

PostGIS (R-tree via GiST)

SQL-native GIS with polygon support.

The default for anything that lives in Postgres. Supports points, polygons, complex shapes, distance, intersections, all via SQL. Scales to tens of millions of points comfortably.

Pros

+No new infrastructure
+Polygon and arbitrary-shape support
+Transactional with the rest of your DB

Cons

−Same scale ceiling as Postgres itself
−Less optimised for extreme-QPS proximity at massive scale

Choose this variant when

Already on Postgres
General GIS use cases
Up to ~10M rows

Redis GEO (geohash)

GEOADD and GEORADIUS commands backed by geohash-sorted sets.

In-memory, microseconds per query. Great for the dispatch or "nearby drivers" cache tier. Not the source of truth — reload from primary DB on failure.

Pros

+Sub-millisecond queries
+Trivial API
+Handles millions of points per Redis instance

Cons

−Memory cost — every point in RAM
−Geohash edge cases on cell boundaries
−Not a source of truth

Choose this variant when

Hot-path proximity queries
Ride-sharing dispatch
Cache tier over a truth DB

S2 / H3 custom index

Sharded in-memory cells for planet-scale systems.

When commodity geo indexes don't scale. Uber shards the driver location store by H3 cell id; dispatch looks up by cell and adjacent cells. Custom infrastructure, significant engineering investment.

Pros

+Planet-scale QPS
+Near-uniform cells globally
+Proven at Uber / Google

Cons

−Custom infrastructure
−Hard to operate
−Only justified at very large scale

Choose this variant when

Ride-sharing at scale
Global location services
Proximity is the business

Worked example

Scenario: "Find drivers within 2 km" for ride-sharing, 1M online drivers, 1k dispatch queries/s.

Truth store: Postgres table drivers with location updated via mobile pings every 5 s. PostGIS 2dsphere index on the geometry column.

Hot cache: Redis GEO cluster, sharded by city/region. Every driver ping updates their Redis entry (GEOADD drivers:sf driver_123 -122.41 37.78). Dispatch queries hit Redis first (GEORADIUS drivers:sf -122.41 37.78 2 km).

Rationale:

Redis GEO is sub-ms, handles 10k+ QPS per shard, perfect for the hot dispatch path.
PostGIS is the source of truth for failure recovery + historical queries.
Sharding by region keeps each Redis shard small (100k drivers ≈ 20 MB in RAM).

Failure: Redis dies → dispatch falls back to PostGIS (slower, adds ~20 ms latency but still works). Redis rebuilds from driver pings on restart (5 s to warm up).

Scale beyond this: 10M drivers would justify custom S2/H3 + per-cell shards + in-process caches on the dispatch servers — Uber territory.

Good vs bad answer

Interviewer probe

“How do you find drivers within 2 km of a pickup point?”

Weak answer

"SQL: WHERE lat BETWEEN ? AND ? AND lng BETWEEN ? AND ?."

Strong answer

"That does a full scan on anything non-trivial. Use a spatial index. Primary: PostGIS with a 2dsphere GiST index — ST_DWithin gives radius queries in milliseconds. Hot-path cache: Redis GEO sharded by region, sub-ms GEORADIUS. Dispatch hits Redis first; Redis is rebuilt from the PostGIS truth store. At planet scale (millions of drivers globally, 10k+ QPS) I'd move to S2 or H3 cells with custom sharded in-memory indexes — Uber's model. Geohash alone has edge-cell problems that need 9-cell lookups to solve."

Why it wins: Knows why SQL BETWEEN fails, names appropriate indexes, layers cache + truth, and calibrates to scale.

Interview playbook2–3 min whenever a location / proximity feature appears

When it comes up

A "find nearby" feature — drivers, restaurants, friends, stores
Ride-sharing, delivery, mapping, or store-locator problems
The interviewer asks for points within a radius or a bounding box
Anything where the query is "closest to me"

Order of reveal

1
1. Name the access pattern. Radius, bounding box, or k-nearest-neighbours — the shape of the spatial query drives the index choice.
2
2. Reject BETWEEN. A WHERE lat BETWEEN ... AND lng BETWEEN ... does two B-tree scans and intersects them — it dies at a few thousand rows. I use a real spatial index.
3
3. Pick the index. PostGIS 2dsphere for general GIS, Redis GEO for the sub-millisecond hot path, S2 or H3 cells at planet scale.
4
4. Handle geohash edges. With geohash I always query the target cell plus its 8 neighbours, then filter by true distance, so a point just across a cell boundary is not missed.
5
5. Cache over truth + shard. Redis GEO is the hot cache rebuilt from a PostGIS source of truth; at huge scale I shard the index by region or cell.

Signature phrases

“Find-nearby is not a B-tree query — it needs a spatial index.”

“PostGIS for general GIS, Redis GEO for the sub-ms hot path, S2/H3 at planet scale.”

“Geohash always queries the cell plus its eight neighbours.”

“Redis GEO is the cache; PostGIS stays the source of truth.”

“Find-nearby is not a B-tree query — it needs a spatial index.” — The framing that shows you know why GIS indexes exist.
“PostGIS for general GIS, Redis GEO for the sub-ms hot path, S2/H3 at planet scale.” — Demonstrates you calibrate the tool to the scale.
“Geohash always queries the cell plus its eight neighbours.” — Names the edge-cell fix interviewers probe for.
“Redis GEO is the cache; PostGIS stays the source of truth.” — Shows the cache-over-truth layering.

Likely follow-ups

?“Driver locations update every few seconds. How does the index keep up with the write rate?”Reveal

The high-frequency updates land in an in-memory Redis GEO tier sharded by region — a GEOADD per ping is microseconds, so a region with 100k drivers absorbs the churn easily. The PostGIS source of truth is updated asynchronously (or only periodically) since dispatch reads from Redis. The write rate, not the read rate, is why you do not put this directly on a disk-based spatial index.

?“How do you shard a global proximity index?”Reveal

By spatial cell — S2 or H3 cell id, or coarse region/city. Each shard owns a contiguous patch of the map, so a proximity query routes to the target cell’s shard plus the shards owning adjacent cells, then merges. Hexagonal H3 is popular for ride-sharing because every neighbour is equidistant, which makes the "expand to neighbours" step uniform. Sharding by cell keeps each index small and the query bounded to a handful of shards.

?“Your radius query returns a slightly square-ish result set. Why, and how do you make it exact?”Reveal

Spatial indexes work on grid-aligned cells (geohash boxes, S2/H3 cells), so the first pass returns everything in the covering cells — an approximation shaped like the grid, not a perfect circle. You then run a second pass that filters the candidates by true great-circle (Haversine) distance to the target, dropping the corners. Candidate-gather from the index, then exact-distance filter — that two-step is the standard pattern.

Common mistakes

BETWEEN on lat/lng columns

Two independent B-tree lookups intersected — dies at ~10k rows. Use a real spatial index.

Geohash without 9-cell lookup

Two nearby points can have completely different geohashes if they're on opposite sides of a cell boundary. Always query the target cell + 8 neighbours; filter by actual distance.

Spatial index over huge dataset without shardingAdvanced

100M points in a single PostGIS index is painful. Shard by region (country, city, S2 cell level).

Confusing distance unitsAdvanced

PostGIS geometry uses planar math; geography uses geodesic. Use geography for lat/lng + km/miles; geometry for projected coordinate systems. Mixing gives silently wrong results.

Practice drills

Explain the geohash edge-cell problem.Reveal

Geohash is a recursive 2D quadtree encoded as a base-32 string. Two nearby points can share a long prefix (same cell) OR have no shared prefix (on opposite sides of a cell boundary). E.g. a point at lat 0.001 and another at lat -0.001 are 220 m apart but in different top-level cells. Fix: always query the target cell PLUS its 8 neighbours, then filter by actual Haversine distance.

Interviewer: "PostGIS or S2?"Reveal

PostGIS if you're in Postgres and scale is moderate (tens of millions of points, hundreds of QPS). S2 if you're planet-scale with global uniform cells and very high QPS — requires custom infrastructure. For 95% of apps, PostGIS. For Uber / Google Maps / Foursquare, S2/H3.

How would you support "find the 10 nearest coffee shops"?Reveal

kNN query. In PostGIS: ORDER BY location <-> ST_Point(lng, lat) LIMIT 10 — the <-> operator uses the spatial index. In geohash-based systems: walk outward from the target cell through neighbours using a priority queue; stop when the 10 nearest are confirmed (no undiscovered cell closer than the 10th candidate). Libraries implement this pattern.

Cheat sheet

•Never BETWEEN on lat/lng. Use a spatial index.
•Default: PostGIS 2dsphere. Covers most apps.
•Hot path: Redis GEO for sub-ms proximity.
•Planet-scale dispatch: S2 or H3 custom.
•Geohash? Always query 9 neighbour cells.
•Shard by region when data outgrows a single index.

Practice this skill

No problem is tagged directly to Geospatial indexing yet. These published problems still exercise the same interview category.

search autocomplete ticket booking news feed

Read this if

Tool

Strength

Typical use

PostGIS (R-tree)

Polygons + points, SQL native

General GIS, maps, boundaries

MongoDB 2dsphere

Simple API, JSON-native

Most app-level geo

Redis GEO (geohash)

In-memory, very fast

Hot-path proximity queries, ride dispatch

S2 (Google)

Global uniform cells

Foursquare, Uber, planet-scale

H3 (Uber)

Hexagonal cells

Ride-sharing, uniform neighbour distance

Elasticsearch geo

Geo + text combined

Search with location filtering