Geospatial / proximity lookup
Geohash / S2 / H3 spatial index for "nearby X" queries. Straight lat/lng on a B-tree dies at a few thousand rows.
When to reach for this
Reach for this when…
- "Find N nearest" queries
- Bounded-radius search ("within 2 km")
- Real-time location tracking + dispatch
- Map-viewport (bounding-box) queries
- Geofencing / "is this point inside region X?"
Not really this pattern when…
- Locations are fixed and few (< ~1000) — load into memory and linear-scan
- Lookup is by region id, not coordinates (that's a plain index)
- You only ever need exact-point lookup by key (primary index)
Good vs bad answer
Interviewer probe
“Find the nearest available drivers within 2 km of a pickup point.”
Weak answer
"SELECT * FROM drivers WHERE lat BETWEEN ? AND ? AND lng BETWEEN ? AND ?."
Strong answer
"That's two B-trees intersected — it fetches whole latitude and longitude bands and dies past a few thousand rows, because the cost is proportional to the bands, not the answer. I'd use a real spatial index. Durable truth is PostGIS with a GiST index on a geography column — ST_DWithin returns matches in milliseconds and KNN via <-> gives nearest-K. For the dispatch hot path I put a Redis GEO tier in front, sharded by city, for sub-millisecond GEOSEARCH; driver pings update Postgres (truth) and Redis (fast path), and if Redis dies dispatch falls back to PostGIS at higher latency while the hot tier rebuilds. The query is three stages: spatial recall from Redis, then filter to available + matching vehicle + non-stale (I attach last_seen/TTL), then rank by ETA — road-aware, not straight-line — and offer the best driver. Near a shard boundary I query the adjacent region shard too. At planet scale I'd move the hot path to H3 cells over a distributed store, query kRing neighbours, and split hot cells like stadium exits to finer resolution."
Why it wins: Rejects BETWEEN with the *reason*, names the right index and the geography column, layers durable truth + rebuildable hot tier with a fallback, separates recall/filter/rank, handles staleness and shard boundaries, uses ETA over straight-line distance, and calibrates to planet scale. The weak answer is a full scan with no spatial index, no freshness, and no ranking.
Cheat sheet
- •Never BETWEEN on lat/lng — two B-trees intersected, cost scales with the dataset.
- •Default durable store: PostGIS GiST on a geography column (not geometry).
- •Hot path: Redis GEO for sub-ms GEOSEARCH; shard by region.
- •Planet scale: H3 / S2 cells over a distributed store (Uber/Google territory).
- •Boundary fix: query the target cell + neighbour ring, then filter true distance.
- •Always Haversine-filter after the cell/box query — cells are approximations.
- •Hot tier is a rebuildable cache; durable store owns truth and is the fallback.
- •Dispatch = spatial recall → business filter → rank by ETA (not straight-line).
- •Attach last_seen/TTL; exclude stale objects; tune heartbeat cadence.
- •Dense cells are hot partitions hidden by average QPS — split to finer resolution.
- •Resolution: coarse = fewer cells, more candidates; fine = more cells, wider ring.
- •Scale tiers: <10M single PostGIS; 10–100M sharded PostGIS + Redis; 100M+ H3/S2.
Core concept
A spatial index answers "find all points within radius R of (lat, lng)" — or "the K nearest" — without scanning the dataset. The reason it needs special machinery is that two ordinary B-trees don't compose into a 2-D index: indexing latitude and longitude separately lets you fetch everything in a latitude band and everything in a longitude band, but their intersection is still an enormous candidate set spread far from your point.
A lat/lng BETWEEN query fetches everything in a latitude band ∪ a longitude band and intersects — a huge candidate set far from the point.
Real spatial indexes solve this with one of two families:
- Tree-based (R-tree / GiST in PostGIS): recursively partitions space into bounding boxes, so a query walks only the boxes that overlap the search region.
- Cell / grid-based (geohash, Google S2, Uber H3): map each point to a hierarchical cell id (a string or integer). "Nearby" becomes "same or neighbouring cell," reducing proximity to a prefix/id lookup that any key-value store can serve.
The main systems, by job:
- PostGIS (Postgres + GiST/SP-GiST): R-tree-based, handles points and polygons. The default durable truth store when you're already on Postgres.
- MongoDB 2dsphere: JSON-native, simple API — fine for most apps.
- Redis GEO: a geohash-backed sorted set with sub-millisecond
GEOSEARCH/GEORADIUS. The ideal hot dispatch tier. - Elasticsearch geo_point: combine geo with full-text and aggregations.
- S2 / H3: cell libraries for planet-scale, uniform-cell custom infrastructure (Uber territory).
Two points near a cell edge have different cell ids. Always search the target cell and its ring, then filter by true distance.
The boundary trap is the single most-tested subtlety. Two points 50 m apart but on opposite sides of a cell edge get different cell ids. Querying only the target cell silently misses them. The fix is universal across geohash/S2/H3: query the target cell plus its ring of neighbours, then filter by the true (Haversine) distance. State this unprompted and you've signalled you actually understand grid indexing rather than having memorised "use geohash."
At scale, separate truth from the hot path. A ride-hailing system persists every driver ping to a durable store (PostGIS) for recovery and analytics, while a Redis GEO tier — sharded by city/region so each shard stays small — serves the sub-millisecond dispatch loop. Redis is a rebuildable cache, not the truth: if it dies, dispatch falls back to PostGIS at higher latency and the hot tier is repopulated from the durable store.
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
Interview walkthrough
Worked example: nearest available drivers within 2 km of a pickup
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
Step 0 — reject the naive query. lat/lng BETWEEN is two independent B-tree lookups intersected; it fetches whole bands and its cost scales with the dataset, not the answer. Use a spatial index.
Step 1 — durable truth. PostGIS with a GiST index on a geography column. ST_DWithin(location, pickup, 2000) returns everyone within 2 km in milliseconds; ORDER BY location <-> pickup LIMIT k gives nearest-K, both index-backed. geography (not geometry) so distances are geodesically correct.
Step 2 — hot dispatch tier. Real-time matching needs sub-millisecond reads, so put a Redis GEO tier in front, sharded by city. Drivers GEOADD their position every few seconds; the update writes to Postgres (truth) and Redis (fast path). Dispatch runs GEOSEARCH ... BYRADIUS 2 km against the city shard.
Spatial index returns nearby candidates; business rules (availability, vehicle, ETA) filter; a scorer ranks the survivors.
Step 3 — the matching pipeline (recall → filter → rank).
- Recall:
GEOSEARCHreturns all candidates within (a slightly generous) 2 km. - Filter: keep only drivers who are available, match the requested vehicle type, and are not stale — each position carries
last_seen, and anything older than ~10 s is excluded. - Rank: score survivors by ETA (road- and traffic-aware, not straight-line — a driver across an un-bridged river is farther than the metres suggest) plus rating and acceptance likelihood, then offer the trip to the best one.
Step 4 — boundaries and freshness.
- A pickup near a city-shard edge also queries the neighbouring shard, then merges and distance-filters — the neighbour-ring idea at shard granularity.
- Heartbeat cadence is tuned: faster pings on-trip or near likely matches, slower when idle, to balance battery and write load against accuracy. At fleet scale the ping firehose is ingested through a stream and only the latest position is materialised into Redis.
Each location carries a last_seen timestamp; dispatch excludes stale drivers; heartbeat cadence trades battery for accuracy.
Step 5 — failure and scale.
- Redis down → dispatch falls back to PostGIS (slower but correct); Redis repopulates from truth.
- Dense hot cell (stadium exit) → split it to a finer resolution to spread load.
- Past ~100M points / global QPS → move the hot path to H3 cells over a distributed store, querying
kRingneighbours.
Result. Sub-millisecond candidate recall, road-aware ranking, fresh-only drivers, graceful degradation when the cache dies, and a clean path from one PostGIS node to planet-scale cell sharding — all anchored on a durable truth store the hot tier can always be rebuilt from.
Interview playbook
When it comes up
- The prompt asks for nearest, nearby, radius, map viewport, dispatch, or geofencing
- Objects (drivers, couriers, users) move and are searched by coordinate
- Plain indexes on latitude and longitude would scan too much data
- A ride-hailing / delivery / "near me" discovery system appears
Order of reveal
- 1Reject naive range scans. Two B-trees on lat/lng are not a spatial index — they fetch whole bands and the cost scales with the dataset, not the answer.
- 2Pick the index by scale. PostGIS for durable truth, Redis GEO for the sub-ms hot path, H3/S2 cells for planet scale.
- 3Handle cell boundaries. Query the target cell plus its neighbour ring, then filter by true Haversine distance.
- 4Separate truth from hot tier. The hot tier is a rebuildable cache; the durable store owns state and is the fallback.
- 5Make dispatch recall → filter → rank. Spatial index gives candidates; filter by availability and staleness; rank by ETA, not straight-line distance.
- 6Address freshness. last_seen/TTL excludes stale objects; heartbeat cadence trades battery and write load for accuracy.
Signature phrases
- “Query the cells, then filter the distance” — Names the correct two-step spatial workflow and the boundary fix.
- “Redis GEO is the hot tier, not the truth” — Shows the durable-truth-plus-rebuildable-cache instinct.
- “Cost scales with the bands, not the answer” — Explains precisely why BETWEEN fails.
- “ETA, not straight-line distance” — Shows distance is a proxy and ranking is road-aware.
- “Resolution is a fan-out vs precision dial” — Connects H3/S2 cell size to candidate count and ring width.
- “last_seen makes location perishable” — Names freshness as a first-class concern.
Likely follow-ups
?“What if the pickup is near a shard / cell boundary?”Reveal
Query the neighbouring cells (kRing) or adjacent region shards as well, merge the candidates, and filter by true distance. Boundaries are an internal concern — the system owns the overlap so the user never sees a missed neighbour. For nearest-K I expand rings outward until no unsearched ring could contain a closer point than the current K-th result.
?“What if driver pings are stale?”Reveal
Every position carries last_seen (and often a TTL so it auto-expires). Dispatch excludes anyone older than a threshold — a 40-second-old position sends a rider to where the car was. I tune heartbeat cadence adaptively: faster on-trip or near a likely match, slower when idle in a quiet area, to balance battery and write load against accuracy. At fleet scale I ingest the ping firehose through a stream and only materialise the latest position.
?“A downtown cell is melting while global QPS looks fine. What do you do?”Reveal
That's a hot partition hidden by averages, so I watch per-cell query and update rates, not just global QPS. Mitigations stack: shard by region to isolate the metro, split the hot cell to a finer resolution so its load spreads across more partitions, and cache the popular viewport query with a short TTL since many users request the same area at once.
?“Why not just sort everyone by distance?”Reveal
Computing distance to every object is O(N) per query and ignores the index entirely — it's the BETWEEN problem in a different shape. The spatial index limits the candidate set to objects actually near the point (cells/box), and only then do I compute exact distance on that small set. Recall is cheap and index-backed; the expensive distance/ETA math runs on a bounded candidate set.
Canonical examples
- →Uber / Lyft driver matching
- →DoorDash / Deliveroo restaurant discovery
- →Yelp / Google Maps "nearby" search
- →Pokémon Go / AR proximity triggers
- →ATM / store locator, geofenced notifications
Variants
PostGIS (GiST / SP-GiST)
R-tree spatial index in Postgres — durable truth, points and polygons, ST_DWithin in milliseconds.
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
The default when you're already on Postgres. A geography column with a GiST index supports ST_DWithin (radius), <-> KNN ordering (nearest-K), and full polygon operations (point-in-region, intersections). One system gives you durability, transactional consistency with the rest of your data, and rich geometry.
The critical detail is `geography` vs `geometry`: geometry does planar (flat-plane) math, which is wrong for lat/lng over any real distance; geography does geodesic math on the sphere, which is correct. Mixing them silently produces wrong distances. Use geography for lat/lng proximity.
PostGIS comfortably handles up to tens of millions of points on one well-indexed node. Past that, or when you need sub-millisecond dispatch latency, it becomes the durable truth behind a faster hot tier rather than the hot path itself.
Pros
- +Durable, transactional, and handles polygons + complex shapes
- +One system — no separate store to keep in sync
- +ST_DWithin and KNN (<->) both index-backed
Cons
- −Higher latency than an in-memory hot tier
- −Single index strains past tens of millions of points
- −geography vs geometry mistakes break distances silently
Choose this variant when
- You're already on Postgres and want one durable store
- You need polygons / geofencing, not just points
- Up to tens of millions of points, moderate QPS
Redis GEO hot tier
Geohash sorted set with sub-millisecond GEOSEARCH — the dispatch fast path, sharded by region.
A single global index outgrows a node; partition by city/region so each Redis shard stays small and dispatch routes by pickup location.
Redis GEO stores points in a sorted set keyed by their geohash, so GEOSEARCH ... BYRADIUS returns nearby members in sub-millisecond time. This is exactly what a real-time dispatch loop needs: drivers GEOADD their position every few seconds, and each rider request runs a radius search against the hot set.
Roughly 1M points fit in ~200 MB, so shard by region (city, metro, country) to keep each shard small, hot, and independently scalable — and so a dense downtown doesn't make one shard the bottleneck. Redis is a derived cache, not the truth: it's rebuilt from the durable store on failure, and dispatch falls back to PostGIS (slower) while it repopulates. Attach a TTL / last_seen so stale driver positions age out automatically.
Pros
- +Sub-millisecond radius queries for the dispatch loop
- +Trivial GEOADD / GEOSEARCH API, JSON-free
- +Region sharding keeps each shard small and hot
Cons
- −Not durable — a derived cache that must be rebuildable
- −Points only; no polygons or complex geometry
- −Dense regions create hot shards needing finer splits
Choose this variant when
- Real-time dispatch needs sub-ms proximity reads
- High write rate of moving objects (drivers, couriers)
- You already have a durable truth store behind it
H3 / S2 cell index
Hierarchical cell ids over a key-value store for planet-scale, uniform-cell custom indexing.
Coarse cells mean fewer cells to query but more candidates to filter; fine cells mean precise candidates but a wider neighbour ring.
Google's S2 (spherical quadtree, square-ish cells) and Uber's H3 (hexagonal grid) reduce proximity to cell membership: each point maps to a cell id at a chosen resolution, "nearby" is "same or neighbouring cell" (kRing in H3), and the cell id is just a key in any distributed KV store. This decouples spatial indexing from any single database and scales horizontally to the whole planet.
Hexagons (H3) have a nice property: all six neighbours are equidistant from the centre, which makes ring-based "expand the search" logic uniform — unlike a square grid where edge and corner neighbours differ. Resolution is the key dial: coarse cells mean fewer cells to query but more candidates to distance-filter; fine cells mean precise candidates but a wider neighbour ring. You choose resolution by the query radius and the local density. This is the most application-logic-heavy option — you own the cell math, the neighbour expansion, and the distance filter — so reach for it only at the scale where PostGIS + Redis stops being enough.
Pros
- +Planet-scale, store-agnostic — cell id is just a KV key
- +Uniform hex neighbours simplify ring expansion (H3)
- +Decouples spatial indexing from any single DB
Cons
- −Most application logic — you own cells, rings, and filtering
- −Resolution choice is a real precision-vs-fan-out trade-off
- −Overkill below hundreds of millions of points
Choose this variant when
- Hundreds of millions of points / planet-scale dispatch
- You need a store-agnostic, horizontally-sharded index
- Uniform cell analytics (surge zones, supply heatmaps)
Elasticsearch geo_point
Geo filtering fused with full-text and aggregations when search and location combine.
Spatial index returns nearby candidates; business rules (availability, vehicle, ETA) filter; a scorer ranks the survivors.
When a query is "vegan restaurants within 2 km, open now, rated 4+," you're combining geo, full-text, structured filters, and facets — and Elasticsearch's geo_point / geo_distance query does it in one place alongside BM25 and aggregations. It indexes points with a BKD-tree and supports radius, bounding-box, and polygon filters as ordinary clauses.
As with all search engines, treat the index as a derived view of a durable truth store, fed asynchronously (CDC/outbox), never as the source of truth. This is the right call when location is one filter among several in a discovery experience, rather than the high-write dispatch loop Redis GEO is built for.
Pros
- +Geo + full-text + facets + aggregations in one query
- +Bounding-box, radius, and polygon filters built in
- +Great for discovery UX (filter + sort + facet)
Cons
- −A derived index to keep fresh, not durable truth
- −Heavier write path than Redis for moving objects
- −Operational weight of a search cluster
Choose this variant when
- Geo is one filter among text + facets in a discovery UI
- Low-to-moderate location update rate (venues, not drivers)
- You already run Elasticsearch for search
Scaling path
v1 — PostGIS single spatial index
Correct proximity queries from day one with one durable store.
Add a geography column, a GiST index, and query with ST_DWithin for radius or ORDER BY location <-> point LIMIT k for nearest-K. Both are index-backed and return in milliseconds. Durable, transactional, and handles polygons if you need geofencing.
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
This is genuinely the right answer up to tens of millions of points and moderate QPS. The trigger to move on is the real-time dispatch loop: when sub-millisecond proximity reads at high write rates start to strain a single Postgres node.
What triggers the next iteration
- Per-query latency too high for a tight dispatch loop
- High-frequency location writes contend with reads
- A single index strains past tens of millions of points
v2 — Redis GEO hot tier in front of PostGIS
Sub-millisecond dispatch reads while keeping durable truth.
Keep PostGIS as truth and add a Redis GEO hot tier. Driver pings update both — Postgres for durability, Redis for the fast path. Dispatch runs GEOSEARCH against Redis for sub-millisecond candidates and only touches Postgres on a Redis miss/outage.
Spatial index returns nearby candidates; business rules (availability, vehicle, ETA) filter; a scorer ranks the survivors.
The hot tier is a rebuildable cache: if Redis dies, dispatch degrades to PostGIS latency and Redis is repopulated from truth. Add TTL/last_seen so stale positions expire automatically.
What triggers the next iteration
- A single Redis instance fills as the fleet grows (~200 MB/1M points)
- Dense regions concentrate load on one instance
- Dual-writing truth + cache must stay consistent on failure
v3 — shard the hot tier by region
Keep each shard small, hot, and independently scalable.
Partition the Redis GEO tier by city/region so each shard holds one metro's points and dispatch routes a request to the shard for its pickup location. This keeps shards small, isolates a busy city from a quiet one, and lets you scale hot regions independently.
A single global index outgrows a node; partition by city/region so each Redis shard stays small and dispatch routes by pickup location.
Now you must handle shard-boundary queries: a pickup near a region edge searches the home shard and the adjacent one, then merges and distance-filters — the same neighbour-ring idea, one level up.
What triggers the next iteration
- Pickups near region edges must query adjacent shards
- A dense downtown cell is still a hot spot within its shard
- Routing logic must map coordinates → correct shard
v4 — H3/S2 cell index for planet scale
Store-agnostic, horizontally-sharded indexing with hot-cell control.
Move the hot path to H3/S2 cells over a distributed KV store. Each point's cell id is its shard key; proximity is kRing neighbour expansion plus a true-distance filter. Resolution is tuned per use case, and dense "hot cells" (stadium exits, airports) can be split to a finer resolution to spread load.
Coarse cells mean fewer cells to query but more candidates to filter; fine cells mean precise candidates but a wider neighbour ring.
This is Uber-scale territory. The cell grid also doubles as the substrate for supply/demand analytics and surge zones, since uniform cells aggregate cleanly. The cost is owning all the cell math yourself.
What triggers the next iteration
- Hot cells (events, airports) still need finer-resolution splits
- Resolution choice trades neighbour-ring width vs candidate count
- All cell/ring/filter logic lives in application code
Deep dives
Why two B-trees are not a spatial index
A lat/lng BETWEEN query fetches everything in a latitude band ∪ a longitude band and intersects — a huge candidate set far from the point.
The most common wrong answer is WHERE lat BETWEEN ? AND ? AND lng BETWEEN ? AND ?, and understanding exactly why it fails is what separates a real answer.
A lat/lng BETWEEN query fetches everything in a latitude band ∪ a longitude band and intersects — a huge candidate set far from the point.
A B-tree on lat can efficiently return every row inside a latitude band; a B-tree on lng can return every row inside a longitude band. But these are two independent one-dimensional lookups. The query planner fetches one band and the other and intersects them — and the intersection of "everyone at this latitude" with "everyone at this longitude" is a candidate set proportional to the bands, not to the answer. In a dense city, a latitude band alone can contain hundreds of thousands of points stretching east-west across the whole map; you pay to fetch and intersect all of them to find the few actually near your point.
A true spatial index is inherently two-dimensional: an R-tree partitions the plane into nested bounding boxes so a query descends only into boxes that overlap the search region; a grid index (geohash/S2/H3) maps the 2-D point to a single locality-preserving cell id so nearby points share a key prefix. Either way the work is proportional to the number of points near the query, not to a band across the entire dataset. That's the whole game: collapse two dimensions into one index that preserves locality.
Cell boundaries: the classic trap (and the fix)
Two points near a cell edge have different cell ids. Always search the target cell and its ring, then filter by true distance.
Every grid-based index — geohash, S2, H3 — has the same Achilles' heel: two points can be metres apart yet fall in different cells, so a single-cell lookup silently misses near neighbours sitting just across an edge.
Two points near a cell edge have different cell ids. Always search the target cell and its ring, then filter by true distance.
The fix is universal and you should state it unprompted: query the target cell plus its ring of neighbours, then filter by the true distance. For geohash that's the 8 adjacent cells; for H3 it's kRing(cell, k); for S2 it's the neighbouring cells at your level. After gathering candidates from all those cells, compute the actual Haversine (great-circle) distance and drop anything beyond R.
For nearest-K the logic generalises to a ring-expansion: start at the target cell, expand outward ring by ring, maintaining a running top-K heap, and stop only when the nearest possible point in any unsearched ring is farther than your current K-th result. For a large radius you may need to query many rings — at which point a coarser resolution (bigger cells, fewer of them) is the better trade. This is precisely why resolution selection matters: it sets how many cells a given radius touches.
Separate durable truth from the rebuildable hot tier
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
A production location system has two very different jobs: remember where everything is (durable, analytical, recoverable) and answer "who's near me right now" in under a millisecond (hot, volatile, high-write). Trying to do both in one store is the architectural mistake.
PostGIS holds durable state; a per-region Redis GEO tier serves sub-millisecond proximity reads; dispatch falls back to PostGIS if Redis is gone.
So you layer them. PostGIS (or any durable spatial store) is the truth: every driver ping is persisted for recovery, billing, trip history, and analytics. Redis GEO is the hot tier: a derived, in-memory index that serves the dispatch loop and can be thrown away and rebuilt from truth at any time. Dispatch reads Redis; if Redis is unavailable, it falls back to PostGIS at higher latency rather than failing — graceful degradation, not an outage.
The discipline that makes this safe: the hot tier must be reconstructible from truth, writes go to truth first (or at least durably), and stale entries age out via TTL/last_seen so a crashed client's last position doesn't haunt dispatch forever. The same two-tier shape recurs everywhere in this curriculum — a durable system of record behind a fast derived view — and naming it explicitly signals senior thinking.
Dense areas create hot partitions — and average QPS hides them
A single global index outgrows a node; partition by city/region so each Redis shard stays small and dispatch routes by pickup location.
Geospatial load is brutally non-uniform. A stadium at closing time, an airport arrivals curb, or a downtown core at rush hour concentrates thousands of drivers and riders into a handful of cells. Your average global QPS looks fine while one cell is melting.
A single global index outgrows a node; partition by city/region so each Redis shard stays small and dispatch routes by pickup location.
The mitigations stack:
- Shard by region so a busy metro is isolated from quiet ones and can scale independently.
- Split hot cells to a finer resolution dynamically, so the load of one dense area spreads across more partitions instead of one.
- Cache popular viewport queries (the map everyone is staring at) with a short TTL, since many users request the same area simultaneously.
- Track per-cell query and update rates, not just global averages — the metric that reveals the hot spot is the per-partition one.
There's also a demand-side angle: those same dense cells are where surge pricing and supply-positioning algorithms run, because the uniform cell grid is a natural substrate for aggregating supply and demand. So the cell index isn't only a lookup structure — at scale it's also the coordinate system for the marketplace's balancing logic.
Freshness, staleness, and the heartbeat trade-off
Each location carries a last_seen timestamp; dispatch excludes stale drivers; heartbeat cadence trades battery for accuracy.
Moving objects make location data perishable. A driver's position from 40 seconds ago is worse than useless for dispatch — it can send a rider to where the car was. So freshness is a first-class design concern, not an afterthought.
Each location carries a last_seen timestamp; dispatch excludes stale drivers; heartbeat cadence trades battery for accuracy.
Every stored position carries a `last_seen` timestamp (and often a TTL so it auto-expires). Dispatch filters out anyone whose last_seen is older than a threshold (say, 10 seconds) so it only ever offers genuinely-present drivers. The tension is the heartbeat cadence: pinging every second gives near-perfect accuracy but drains battery and floods the write path with millions of GEOADDs; pinging every 30 seconds is cheap but lets dispatch act on stale data. Production systems tune this adaptively — faster heartbeats when a driver is on a trip or near a likely match, slower when idle in a quiet area.
This also shapes the write architecture: at millions of moving objects, the location-update stream is a high-throughput firehose in its own right. It's common to ingest pings through a queue or stream (so a burst doesn't overwhelm the store), debounce/aggregate them, and only materialise the latest position into the hot tier — turning a write-heavy firehose into a manageable "latest known position per object" view.
Proximity is recall — dispatch is recall + filter + rank
Spatial index returns nearby candidates; business rules (availability, vehicle, ETA) filter; a scorer ranks the survivors.
"Find drivers within 2 km" is only the first stage of a real matching system, and strong candidates make the distinction explicit: the spatial index provides recall, but the actual decision adds filtering and ranking.
Spatial index returns nearby candidates; business rules (availability, vehicle, ETA) filter; a scorer ranks the survivors.
- 1Spatial recall.
GEOSEARCH(orST_DWithin/kRing) returns every driver inside the radius — a cheap, index-backed candidate set, typically over-fetched a little so the next stages have room to filter. - 2Business filter. Drop anyone who isn't available, doesn't match the requested vehicle type, is stale (old
last_seen), or is already on a trip. This is a yes/no pass that doesn't care about ranking. - 3Rank. Score the survivors by what actually matters — usually ETA (which accounts for road network and traffic, not just straight-line distance), plus driver rating, acceptance likelihood, and marketplace fairness — and offer the trip to the best one.
The crucial insight is that geometric distance is a proxy, not the objective: a driver 1.5 km away across a river with no nearby bridge has a worse ETA than one 2 km away on a direct road. So the spatial index deliberately returns a generous candidate set, and the ranking stage applies the expensive, road-aware ETA model to just those candidates — the same cheap-recall-then-expensive-rank shape you see in search. Keeping these stages separate keeps the hot spatial query fast while letting the matching logic be as smart as it needs to be.
Decision levers
Primary / truth store
PostGIS if you're on Postgres (durable, polygons, transactional). MongoDB 2dsphere if you're on Mongo. For pure high-scale location workloads, a durable truth store behind a Redis-first or H3-cell hot path.
Hot-path tier
Redis GEO for sub-millisecond reads (~200 MB per 1M points). Shard by city/region to keep each shard small and isolate dense areas. The hot tier is a rebuildable cache, never the truth.
Index family
Tree-based (PostGIS GiST/R-tree) for rich geometry on one node. Cell-based (geohash/S2/H3) for store-agnostic, horizontally-sharded, planet-scale indexing. Both require a true-distance filter after the coarse query.
Query shape
Radius (ST_DWithin / GEOSEARCH BYRADIUS) and nearest-K (<-> / ring expansion) for dispatch; bounding box for map viewports; polygon/containment for geofencing. Pick the index that serves your dominant shape.
Resolution (cell systems)
Coarse cells = fewer cells queried but more candidates to filter; fine cells = precise candidates but a wider neighbour ring. Tune by query radius and local density; split hot cells to finer resolution to spread load.
Freshness
Attach last_seen/TTL to every position; exclude stale objects from dispatch. Tune heartbeat cadence (battery + write load vs accuracy), adaptively faster on-trip and slower when idle. Ingest the ping firehose through a stream at scale.
Scale tier
< ~10M points: single PostGIS index. 10–100M: sharded PostGIS + Redis GEO hot tier. 100M+ with dispatch QPS: H3/S2 cells over a distributed store (Uber territory).
Failure modes
Two independent B-trees intersected — fetches whole latitude/longitude bands and dies past ~10k rows, with cost proportional to the bands, not the answer. Always use a real spatial index.
Two points metres apart across a cell edge have different cell ids, so a single-cell lookup silently misses them. Query the target cell plus its neighbour ring, then filter by true distance.
Cells and bounding boxes are approximations — a corner of the queried region can be well beyond R. Always compute the actual Haversine distance and drop candidates outside the radius.
PostGIS geometry uses planar math (wrong for lat/lng over distance); geography uses geodesic math (correct). Mixing them silently returns wrong distances.
Redis GEO is a derived, volatile cache. If it's the only copy, a flush loses every location. Persist to a durable store and make the hot tier rebuildable, with PostGIS fallback on outage.
100M+ points in a single index strains the node and concentrates dense-area load. Shard by region and split hot cells to finer resolution.
Dispatching on a 40-second-old position sends riders to where a driver was. Attach last_seen/TTL, exclude stale objects, and tune heartbeat cadence.
Case studies
Uber
Uber — H3 hexagonal grid as the coordinate system for the marketplace
Uber built and open-sourced H3, a hierarchical hexagonal geospatial index, because the obvious alternatives didn't fit a planet-scale, real-time marketplace. Latitude/longitude with range queries doesn't scale; square grids (and geohash) have the awkward property that a cell's edge neighbours and corner neighbours are at different distances, which complicates "expanding ring" searches and smoothing.
Hexagons solve this: every one of a hexagon's six neighbours is equidistant from the centre, so ring-based neighbour expansion (kRing) and spatial smoothing are uniform. H3 is hierarchical — each cell subdivides into finer children — so Uber can pick a resolution per task: coarse cells for city-level supply/demand analytics, fine cells for precise dispatch.
Critically, H3 is more than a lookup structure for Uber — it's the coordinate system for the entire marketplace. Supply and demand are bucketed into cells to compute surge pricing, position idle drivers toward predicted demand, and visualise the network. Because cells are uniform and store-agnostic (a cell id is just an integer key), the same index sharded across a distributed store serves both the sub-second dispatch query and the batch analytics that balance the market.
The lesson for an interview: at planet scale the spatial index isn't a detail bolted onto a database — it's the foundational coordinate system, and the choice of cell shape and resolution has consequences far beyond "find nearby."
Google S2 — projecting the sphere onto a quadtree of cells
Google's S2 library underpins geospatial features across Google Maps, and its core idea is elegant: project the sphere onto the six faces of a cube, then recursively subdivide each face into a quadtree of cells, numbering them along a Hilbert space-filling curve. The Hilbert curve is the trick — it maps 2-D cells to 1-D 64-bit integer ids such that cells close in space stay close in id, so a spatial range becomes a small number of contiguous integer ranges.
That property is what makes S2 so practical: because each cell is a single sortable integer that preserves locality, you can index points in an ordinary B-tree or any key-value store and answer "points near here" as a handful of integer range scans — no specialised spatial datastore required. S2 cells are hierarchical (30 levels, from the whole face down to roughly a square centimetre) so you choose resolution by use case, and the library provides the neighbour, covering, and containment operations a region query needs.
S2 covers an arbitrary region (a circle, a polygon) by computing the minimal set of cells that tile it, then querying those cell-id ranges — the covering plus a true-distance filter. The takeaway parallels H3: reduce 2-D proximity to 1-D locality-preserving keys so commodity stores can serve spatial queries, and let resolution be a tunable dial.
Tinder
Tinder — geosharding recommendations by location
Tinder's core query is intensely geospatial: show a user a stack of nearby, eligible profiles, fast, at enormous scale. Their engineering team described moving to geosharding — partitioning the recommendation index by geographic region rather than by a random user-id hash.
The motivation is locality of access: a user in a given city overwhelmingly needs candidates from that same region, so co-locating each region's profiles on the same shard means a recommendation query hits one (or a few adjacent) shards instead of fanning out across the whole fleet. They map regions using a geohash/S2-style cell scheme and assign cells to shards, balancing shard sizes against population density — a dense metropolitan cell may be its own shard while a sparse rural area groups many cells together. This is the exact density-aware sharding this pattern describes: uniform cells, non-uniform population, so shard boundaries follow demand, not geometry.
Boundary handling matters here too — a user near a region edge must draw candidates from neighbouring shards — which is the cell-neighbour problem one level up at the shard granularity. The payoff was dramatically reduced query fan-out and latency: most recommendation requests became local to a shard.
Decision table
Geospatial choices depend on query shape, update rate, durability, and point count.
| Choice | Best for | Trade-off | Robust answer includes |
|---|---|---|---|
| PostGIS (GiST) | Durable truth + polygons/geofencing | Higher latency than an in-memory tier | geography vs geometry, ST_DWithin, KNN <-> |
| Redis GEO | Sub-ms dispatch reads, high write rate | Derived cache, not durable | Region sharding, TTL/last_seen, rebuild path |
| H3 / S2 cells | Planet-scale, store-agnostic indexing | You own cell/ring/filter logic | Resolution choice, kRing neighbours, distance filter |
| Elasticsearch geo | Geo + text + facet discovery | Derived index, heavier writes | DB truth, async indexing, geo_distance filter |
- Whatever the index, always filter candidates by true (Haversine) distance after the cell/box query.
- At scale, separate durable truth (PostGIS) from the rebuildable hot tier (Redis GEO / H3 cells).
Drills
Why does `lat/lng BETWEEN` fail?Reveal
It's two independent B-tree lookups intersected. The latitude B-tree returns everyone in a latitude band, the longitude B-tree everyone in a longitude band, and you fetch and intersect both — so the cost is proportional to the bands (potentially the whole map east-west), not to the handful of points actually near you. A spatial index walks only the cells/boxes overlapping the query region, so work scales with the answer.
Explain the cell-boundary problem and its fix.Reveal
Grid indexes (geohash/S2/H3) assign each point a cell id, but two points metres apart can sit on opposite sides of a cell edge and get different ids — so querying only the target cell silently misses near neighbours. The fix is universal: query the target cell plus its ring of neighbours (the 8 adjacent cells / kRing / S2 neighbours), gather all candidates, then filter by the true Haversine distance to drop anything beyond R.
How do you do nearest-K on a cell index?Reveal
Ring expansion with a running top-K heap. Start at the target cell, search outward ring by ring, keep the K closest seen so far, and stop only when the nearest possible point in any unsearched ring is farther than your current K-th result — at which point no further ring can improve the answer. In PostGIS the ORDER BY location <-> point LIMIT k operator does this natively via the GiST index.
Why separate PostGIS truth from a Redis GEO hot tier?Reveal
Two different jobs: durable, recoverable, analytical storage (PostGIS) versus sub-millisecond high-write dispatch reads (Redis GEO). Redis is a derived, volatile cache — fast but losable — so it can be thrown away and rebuilt from truth, and dispatch falls back to PostGIS (slower but correct) if Redis is down. Making Redis the only copy would mean a flush loses every location.
Why rank dispatch by ETA instead of straight-line distance?Reveal
Geometric distance is a proxy for what actually matters — how fast a driver can reach the rider. A driver 1.5 km away across a river with no nearby bridge has a worse real ETA than one 2 km away on a direct road. So the spatial index returns a generous candidate set by distance (cheap recall), and the ranking stage applies a road- and traffic-aware ETA model to just those candidates (expensive, but bounded).
A downtown cell is overloaded but global QPS looks fine. Diagnose and fix.Reveal
Geospatial load is non-uniform — a dense area concentrates thousands of objects and queries into a few cells while averages look healthy. Diagnose by tracking per-cell query and update rates, not global QPS. Fix by sharding by region to isolate the metro, splitting the hot cell to a finer resolution so its load spreads across more partitions, and caching the popular viewport query with a short TTL since many users request the same area simultaneously.
When to reach for this