Geospatial indexing
Geohash, S2/H3 cells, radius search, proximity ranking, and hotspot management.
Rectangular lat/lng queries on a plain B-tree die at a few thousand rows. "Find all riders within 5 km" is not a B-tree query — it needs a spatial index.
Read this if your last attempt…
- You have a "find nearby" feature
- You picked PostGIS without knowing why
- You can't explain geohash vs quadtree vs S2
- You're designing Uber, DoorDash, or anything map-based
The concept
The problem: efficiently find all points within a radius (or a bounding box) of a target point. A B-tree on latitude alone gives you a latitude stripe; intersecting with a longitude stripe gives you a rectangle, but two stripes fetched separately + intersected is O(N) per dimension and doesn't scale.
Spatial indexing strategies:
Longer prefix → smaller cell. Nearby points often share a prefix; edge-cases solved by querying 9 adjacent cells.
Geospatial indexing — pick by use case.
| Tool | Strength | Typical use |
|---|---|---|
| PostGIS (R-tree) | Polygons + points, SQL native | General GIS, maps, boundaries |
| MongoDB 2dsphere | Simple API, JSON-native | Most app-level geo |
| Redis GEO (geohash) | In-memory, very fast | Hot-path proximity queries, ride dispatch |
| S2 (Google) | Global uniform cells | Foursquare, Uber, planet-scale |
| H3 (Uber) | Hexagonal cells | Ride-sharing, uniform neighbour distance |
| Elasticsearch geo | Geo + text combined | Search with location filtering |
How interviewers grade this
- You identify the spatial access pattern (radius, bbox, kNN).
- You pick the index (PostGIS for general GIS, S2/H3 for large-scale, Redis geo for cache tier).
- You address the edge-cell problem for geohash.
- You estimate cell counts / storage.
- You name sharding (by geohash prefix) when the dataset is huge.
Variants
PostGIS (R-tree via GiST)
SQL-native GIS with polygon support.
The default for anything that lives in Postgres. Supports points, polygons, complex shapes, distance, intersections, all via SQL. Scales to tens of millions of points comfortably.
Pros
- +No new infrastructure
- +Polygon and arbitrary-shape support
- +Transactional with the rest of your DB
Cons
- −Same scale ceiling as Postgres itself
- −Less optimised for extreme-QPS proximity at massive scale
Choose this variant when
- Already on Postgres
- General GIS use cases
- Up to ~10M rows
Redis GEO (geohash)
GEOADD and GEORADIUS commands backed by geohash-sorted sets.
In-memory, microseconds per query. Great for the dispatch or "nearby drivers" cache tier. Not the source of truth — reload from primary DB on failure.
Pros
- +Sub-millisecond queries
- +Trivial API
- +Handles millions of points per Redis instance
Cons
- −Memory cost — every point in RAM
- −Geohash edge cases on cell boundaries
- −Not a source of truth
Choose this variant when
- Hot-path proximity queries
- Ride-sharing dispatch
- Cache tier over a truth DB
S2 / H3 custom index
Sharded in-memory cells for planet-scale systems.
When commodity geo indexes don't scale. Uber shards the driver location store by H3 cell id; dispatch looks up by cell and adjacent cells. Custom infrastructure, significant engineering investment.
Pros
- +Planet-scale QPS
- +Near-uniform cells globally
- +Proven at Uber / Google
Cons
- −Custom infrastructure
- −Hard to operate
- −Only justified at very large scale
Choose this variant when
- Ride-sharing at scale
- Global location services
- Proximity is the business
Worked example
Scenario: "Find drivers within 2 km" for ride-sharing, 1M online drivers, 1k dispatch queries/s.
Truth store: Postgres table drivers with location updated via mobile pings every 5 s. PostGIS 2dsphere index on the geometry column.
Hot cache: Redis GEO cluster, sharded by city/region. Every driver ping updates their Redis entry (GEOADD drivers:sf driver_123 -122.41 37.78). Dispatch queries hit Redis first (GEORADIUS drivers:sf -122.41 37.78 2 km).
Rationale:
- Redis GEO is sub-ms, handles 10k+ QPS per shard, perfect for the hot dispatch path.
- PostGIS is the source of truth for failure recovery + historical queries.
- Sharding by region keeps each Redis shard small (100k drivers ≈ 20 MB in RAM).
Failure: Redis dies → dispatch falls back to PostGIS (slower, adds ~20 ms latency but still works). Redis rebuilds from driver pings on restart (5 s to warm up).
Scale beyond this: 10M drivers would justify custom S2/H3 + per-cell shards + in-process caches on the dispatch servers — Uber territory.
Good vs bad answer
Interviewer probe
“How do you find drivers within 2 km of a pickup point?”
Weak answer
"SQL: WHERE lat BETWEEN ? AND ? AND lng BETWEEN ? AND ?."
Strong answer
"That does a full scan on anything non-trivial. Use a spatial index. Primary: PostGIS with a 2dsphere GiST index — ST_DWithin gives radius queries in milliseconds. Hot-path cache: Redis GEO sharded by region, sub-ms GEORADIUS. Dispatch hits Redis first; Redis is rebuilt from the PostGIS truth store. At planet scale (millions of drivers globally, 10k+ QPS) I'd move to S2 or H3 cells with custom sharded in-memory indexes — Uber's model. Geohash alone has edge-cell problems that need 9-cell lookups to solve."
Why it wins: Knows why SQL BETWEEN fails, names appropriate indexes, layers cache + truth, and calibrates to scale.
When it comes up
- A "find nearby" feature — drivers, restaurants, friends, stores
- Ride-sharing, delivery, mapping, or store-locator problems
- The interviewer asks for points within a radius or a bounding box
- Anything where the query is "closest to me"
Order of reveal
- 11. Name the access pattern. Radius, bounding box, or k-nearest-neighbours — the shape of the spatial query drives the index choice.
- 22. Reject BETWEEN. A WHERE lat BETWEEN ... AND lng BETWEEN ... does two B-tree scans and intersects them — it dies at a few thousand rows. I use a real spatial index.
- 33. Pick the index. PostGIS 2dsphere for general GIS, Redis GEO for the sub-millisecond hot path, S2 or H3 cells at planet scale.
- 44. Handle geohash edges. With geohash I always query the target cell plus its 8 neighbours, then filter by true distance, so a point just across a cell boundary is not missed.
- 55. Cache over truth + shard. Redis GEO is the hot cache rebuilt from a PostGIS source of truth; at huge scale I shard the index by region or cell.
Signature phrases
- “Find-nearby is not a B-tree query — it needs a spatial index.” — The framing that shows you know why GIS indexes exist.
- “PostGIS for general GIS, Redis GEO for the sub-ms hot path, S2/H3 at planet scale.” — Demonstrates you calibrate the tool to the scale.
- “Geohash always queries the cell plus its eight neighbours.” — Names the edge-cell fix interviewers probe for.
- “Redis GEO is the cache; PostGIS stays the source of truth.” — Shows the cache-over-truth layering.
Likely follow-ups
?“Driver locations update every few seconds. How does the index keep up with the write rate?”Reveal
The high-frequency updates land in an in-memory Redis GEO tier sharded by region — a GEOADD per ping is microseconds, so a region with 100k drivers absorbs the churn easily. The PostGIS source of truth is updated asynchronously (or only periodically) since dispatch reads from Redis. The write rate, not the read rate, is why you do not put this directly on a disk-based spatial index.
?“How do you shard a global proximity index?”Reveal
By spatial cell — S2 or H3 cell id, or coarse region/city. Each shard owns a contiguous patch of the map, so a proximity query routes to the target cell’s shard plus the shards owning adjacent cells, then merges. Hexagonal H3 is popular for ride-sharing because every neighbour is equidistant, which makes the "expand to neighbours" step uniform. Sharding by cell keeps each index small and the query bounded to a handful of shards.
?“Your radius query returns a slightly square-ish result set. Why, and how do you make it exact?”Reveal
Spatial indexes work on grid-aligned cells (geohash boxes, S2/H3 cells), so the first pass returns everything in the covering cells — an approximation shaped like the grid, not a perfect circle. You then run a second pass that filters the candidates by true great-circle (Haversine) distance to the target, dropping the corners. Candidate-gather from the index, then exact-distance filter — that two-step is the standard pattern.
Common mistakes
Two independent B-tree lookups intersected — dies at ~10k rows. Use a real spatial index.
Two nearby points can have completely different geohashes if they're on opposite sides of a cell boundary. Always query the target cell + 8 neighbours; filter by actual distance.
100M points in a single PostGIS index is painful. Shard by region (country, city, S2 cell level).
PostGIS geometry uses planar math; geography uses geodesic. Use geography for lat/lng + km/miles; geometry for projected coordinate systems. Mixing gives silently wrong results.
Practice drills
Explain the geohash edge-cell problem.Reveal
Geohash is a recursive 2D quadtree encoded as a base-32 string. Two nearby points can share a long prefix (same cell) OR have no shared prefix (on opposite sides of a cell boundary). E.g. a point at lat 0.001 and another at lat -0.001 are 220 m apart but in different top-level cells. Fix: always query the target cell PLUS its 8 neighbours, then filter by actual Haversine distance.
Interviewer: "PostGIS or S2?"Reveal
PostGIS if you're in Postgres and scale is moderate (tens of millions of points, hundreds of QPS). S2 if you're planet-scale with global uniform cells and very high QPS — requires custom infrastructure. For 95% of apps, PostGIS. For Uber / Google Maps / Foursquare, S2/H3.
How would you support "find the 10 nearest coffee shops"?Reveal
kNN query. In PostGIS: ORDER BY location <-> ST_Point(lng, lat) LIMIT 10 — the <-> operator uses the spatial index. In geohash-based systems: walk outward from the target cell through neighbours using a priority queue; stop when the 10 nearest are confirmed (no undiscovered cell closer than the 10th candidate). Libraries implement this pattern.
Cheat sheet
- •Never BETWEEN on lat/lng. Use a spatial index.
- •Default: PostGIS 2dsphere. Covers most apps.
- •Hot path: Redis GEO for sub-ms proximity.
- •Planet-scale dispatch: S2 or H3 custom.
- •Geohash? Always query 9 neighbour cells.
- •Shard by region when data outgrows a single index.
Practice this skill
No problem is tagged directly to Geospatial indexing yet. These published problems still exercise the same interview category.
Read this if