Chapter 1: Design a Proximity Service

volume2 proximity geospatial geohash quadtree

Status: 🟩 Interview ready
Difficulty: Medium-Hard
Time to complete: 50 min read + practice


Overview

A proximity service finds nearby points of interest (businesses, restaurants, venues) given a user’s current location and a search radius. Think Yelp, Google Places, or the β€œrestaurants near me” feature inside any map app.

Why this matters:

  • Very common interview question (Uber, Yelp, Google, Meta frequently ask this)
  • Combines database design, geospatial indexing, caching, and API design
  • Teaches real-world geospatial techniques used in production systems

Problem Statement

Design a proximity service that:

  • Accepts a user’s latitude/longitude and a search radius
  • Returns a ranked list of businesses within that radius
  • Handles 100M daily active users and 200M businesses
  • Keeps search latency low (read-heavy, write-light workload)

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

  • What radii do we support? β†’ Fixed options: 500m, 1km, 2km, 5km, 10km, 20km
  • Is location data real-time? β†’ Near real-time is fine; business moves rarely
  • Do businesses update their own location? β†’ Yes, but changes propagate with slight delay (acceptable)
  • Return limit per query? β†’ Default 20 results per page, support pagination
  • What data is returned? β†’ business_id, name, address, lat/lng, rating, thumbnail

Scope:

  • Search businesses by location + radius
  • CRUD for business owners to manage listings
  • Read path and write path are separate concerns

Non-Functional Requirements

  • Low latency: Search results in < 200ms
  • High availability: Downtime loses searches β†’ revenue impact
  • High read throughput: Read-to-write ratio ~100:1
  • Scalability: 100M DAU, 200M businesses

Capacity Estimation

Daily Active Users:     100M
Searches per user/day:  ~5
Total daily searches:   500M
QPS (read):             500M / 86,400 sec β‰ˆ 5,800 QPS
Peak QPS (read):        ~3x average       β‰ˆ 17,400 QPS

Business writes (updates, new listings):
  200M businesses / day (spread evenly)   β‰ˆ very low write QPS (~200 writes/sec)

Read-to-write ratio:    ~30:1 (heavily read-skewed)

Storage estimate:

200M businesses Γ— ~1 KB per record β‰ˆ 200 GB (business metadata)
Geospatial index (geohash strings + IDs): additional ~50 GB
Total: ~250 GB β€” fits in a modest PostgreSQL cluster with replicas

Step 2: High-Level Design (10 min)

API Design

Search nearby businesses:

GET /v1/search?lat={lat}&lon={lon}&radius={radius}&business_type={type}&page={page}

Parameters:
  lat           float   required   User latitude  (e.g., 37.7749)
  lon           float   required   User longitude (e.g., -122.4194)
  radius        int     optional   Meters, default 2000. Options: 500|1000|2000|5000|10000|20000
  business_type string  optional   Filter: "restaurant"|"gas_station"|"hotel"
  page          int     optional   Pagination cursor, default 1

Response 200 OK:
{
  "total": 142,
  "businesses": [
    {
      "business_id": "biz_abc123",
      "name": "Tartine Bakery",
      "address": "600 Guerrero St, San Francisco, CA",
      "lat": 37.7616,
      "lon": -122.4241,
      "rating": 4.7,
      "distance_meters": 380,
      "thumbnail_url": "https://cdn.example.com/biz_abc123.jpg"
    },
    ...
  ],
  "next_page": 2
}

Business CRUD (for business owners):

POST   /v1/businesses                    Create a business listing
GET    /v1/businesses/{business_id}      Get business details
PUT    /v1/businesses/{business_id}      Update business (name, address, hours)
DELETE /v1/businesses/{business_id}      Remove listing

Component Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            Clients                                   β”‚
β”‚                     (Mobile / Web / Partners)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Load Balancer  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚                          β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Location-Based     β”‚   β”‚   Business Service  β”‚
    β”‚  Service (LBS)      β”‚   β”‚   (CRUD)            β”‚
    β”‚  (read-only,        β”‚   β”‚   (writes go here)  β”‚
    β”‚   stateless)        β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
               β”‚                         β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Geospatial Index   β”‚   β”‚   Business DB        β”‚
    β”‚  (Redis / PostGIS)  β”‚   β”‚   (PostgreSQL)       β”‚
    β”‚  Read-heavy         β”‚   β”‚   Write + Read       β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚   Read Replicas      β”‚
                              β”‚   (scale reads)      β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Two-database approach:

DatabasePurposeCharacteristics
Business DB (PostgreSQL)Business metadata: name, address, hours, phoneWrite-heavy; normalized; ACID
Geospatial Index (Redis/PostGIS)Spatial lookup: geohash β†’ list of business_idsRead-heavy; can tolerate eventual consistency

Location-Based Service (LBS):

  • Stateless β†’ horizontally scalable
  • No business logic, pure query layer
  • Translates (lat, lng, radius) β†’ geohash prefix β†’ business_ids β†’ fetch details

Step 3: Deep Dive (20 min)

Approach 1: Naive β€” Full Table Scan (Too Slow!)

SELECT business_id, name, lat, lng,
       distance(lat, lng, :user_lat, :user_lng) AS dist
FROM businesses
WHERE distance(lat, lng, :user_lat, :user_lng) <= :radius
ORDER BY dist
LIMIT 20;

Problem:

  • Computes distance for every row β€” O(n)
  • 200M rows β†’ unusable even with parallelism
  • No index on a computed column

Approach 2: Two-Dimensional Index (Better but Still Slow)

Add separate indexes on lat and lng:

CREATE INDEX idx_lat ON businesses(lat);
CREATE INDEX idx_lng ON businesses(lng);
 
-- Bounding box query (approximate)
SELECT * FROM businesses
WHERE lat BETWEEN :min_lat AND :max_lat
  AND lng BETWEEN :min_lng AND :max_lng;

Problem: Database can efficiently use ONE index at a time. Combining two range indexes (β€œindex merge”) is expensive. Still scans too many rows for dense areas.


Approach 3: Evenly Divided Grid

Divide the world into a fixed NΓ—N grid. Store each business with its grid cell ID. Query a cell + adjacent cells.

β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ 0,3 β”‚ 1,3 β”‚ 2,3 β”‚ 3,3 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0,2 β”‚ 1,2 β”‚ 2,2 β”‚ 3,2 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0,1 β”‚ 1,1 β”‚ 2,1 β”‚ 3,1 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0,0 β”‚ 1,0 β”‚ 2,0 β”‚ 3,0 β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Problem: Uneven distribution. Manhattan has millions of businesses; Sahara desert has zero. Fixed grid is wasteful and inefficient for dense areas.


Core idea: Encode the entire globe into a hierarchy of grid cells using a Base32 string. Longer string = smaller, more precise cell.

How geohash works:

  1. Interleave latitude bits and longitude bits alternately
  2. Encode the resulting bit string in Base32
  3. Result is a short alphanumeric string where common prefix = geographic proximity

Geohash precision table:

LevelLengthCell Size (approx)Use Case
11 char5,000 km Γ— 5,000 kmCountry level
22 chars1,250 km Γ— 625 kmRegion
33 chars156 km Γ— 156 kmCity
44 chars39 km Γ— 20 kmDistrict
55 chars4.9 km Γ— 4.9 km~5km radius
66 chars1.2 km Γ— 0.6 km~1km radius
77 chars152 m Γ— 152 mStreet level

Radius β†’ geohash level mapping:

Radius  β†’ Geohash level
500m    β†’ level 6  (0.6km cell)
1km     β†’ level 6
2km     β†’ level 5  (4.9km cell)
5km     β†’ level 5
10km    β†’ level 4  (20km cell)
20km    β†’ level 4

Geohash example:

San Francisco (37.7749, -122.4194):
  Level 6 geohash: "9q8yy"
  Level 5 geohash: "9q8yy" β†’ prefix "9q8y"
  Level 4 geohash: prefix "9q8"

Nearby restaurant (37.7616, -122.4241):
  Level 6 geohash: "9q8yw"
  β†’ Shares prefix "9q8y" at level 5 (same area!)

Query algorithm:

1. Convert user (lat, lng) to geohash at precision P
   e.g., "9q8yy" at precision 6

2. Get 8 neighboring geohashes:
   ["9q8yz", "9q8yu", "9q8yv", "9q8yg",
    "9q8ys", "9q8yt", "9q8yw", "9q8yx"]

3. Query Redis for all businesses in 9 cells (center + 8 neighbors):
   ZRANGEBYLEX geohash_index "[9q8yy" "[9q8yy\xff"
   (repeat for all 9 cells)

4. For each candidate business_id, fetch metadata from Business DB

5. Compute exact distance, filter to radius, sort by distance

6. Return top 20

Why 9 cells? Boundary problem: A user standing at the edge of geohash cell X might miss businesses in the adjacent cell Y even though they are very close. Querying all 8 neighbors guarantees complete coverage.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ NW nbr  β”‚  N nbr  β”‚ NE nbr  β”‚
β”‚ 9q8yg   β”‚ 9q8yu   β”‚ 9q8yv   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  W nbr  β”‚  CENTER β”‚  E nbr  β”‚
β”‚ 9q8ys   β”‚ 9q8yy   β”‚ 9q8yz   β”‚  ← Always query all 9!
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ SW nbr  β”‚  S nbr  β”‚ SE nbr  β”‚
β”‚ 9q8yt   β”‚ 9q8yw   β”‚ 9q8yx   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Redis data structure for geohash index:

Sorted Set: geohash_index
  Key:    geohash string (lexicographic)
  Score:  0 (all same score β†’ sorted lexicographically by member)
  Member: "9q8yy:biz_abc123"   (geohash:business_id)

# Insert a business:
ZADD geohash_index 0 "9q8yy:biz_abc123"

# Query all businesses with prefix "9q8yy":
ZRANGEBYLEX geohash_index "[9q8yy" "[9q8yy\xff"
β†’ ["9q8yy:biz_abc123", "9q8yy:biz_def456", ...]

Alternative: Redis GEOADD/GEODIST/GEORADIUS (built-in geospatial):

GEOADD locations 37.7616 -122.4241 "biz_abc123"
GEORADIUS locations 37.7749 -122.4194 2 km ASC COUNT 20
  • Redis internally uses geohash for storage
  • GEORADIUS is convenient but has less fine-grained control

Geohash edge case β€” boundary split:

Problem: Two adjacent cells share a border. A business 10m across
the border from a user appears in a different geohash. Without
neighbor queries, it would be missed.

Solution: Always query 9 cells (center + 8 neighbors).
This adds a small overhead but ensures correctness.

Approach 5: QuadTree (Good for Custom Implementations)

Core idea: Recursively subdivide the world into quadrants. Split a cell when it contains more than a threshold number of businesses (e.g., 100).

World
β”œβ”€β”€ NW Quadrant (sparse) β€” leaf at level 2
β”œβ”€β”€ NE Quadrant
β”‚   β”œβ”€β”€ NW sub-quadrant (medium density) β€” leaf
β”‚   β”œβ”€β”€ NE sub-quadrant
β”‚   β”‚   β”œβ”€β”€ NW (dense, e.g., Manhattan) β€” leaf
β”‚   β”‚   β”œβ”€β”€ NE β€” leaf
β”‚   β”‚   β”œβ”€β”€ SW β€” leaf
β”‚   β”‚   └── SE β€” leaf
β”‚   β”œβ”€β”€ SW β€” leaf
β”‚   └── SE β€” leaf
β”œβ”€β”€ SW Quadrant β€” leaf
└── SE Quadrant β€” ...

QuadTree properties:

PropertyValue
Max businesses per leaf100 (configurable threshold)
Build timeO(n log n) β€” each business inserted once per level
SpaceO(n)
Query timeO(log n + k) where k = results returned
UpdateHard β€” must re-balance subtree

Build the tree:

function buildQuadTree(businesses, bounds):
    if len(businesses) <= MAX_LEAF_SIZE:
        return LeafNode(businesses, bounds)

    NW, NE, SW, SE = splitIntoBounds(bounds)
    return InternalNode(
        NW = buildQuadTree(filter(businesses, NW), NW),
        NE = buildQuadTree(filter(businesses, NE), NE),
        SW = buildQuadTree(filter(businesses, SW), SW),
        SE = buildQuadTree(filter(businesses, SE), SE)
    )

Query:

function search(node, point, radius):
    if not overlaps(node.bounds, circle(point, radius)):
        return []
    if isLeaf(node):
        return filter(node.businesses, distance <= radius)
    return (search(node.NW, ...) + search(node.NE, ...) +
            search(node.SW, ...) + search(node.SE, ...))

QuadTree pros and cons:

ProsCons
Adapts to data density (more cells in NYC, fewer in Montana)Hard to update incrementally (business moves = tree rebuild)
Efficient for irregular distributionsMust be rebuilt when data changes significantly
Entire tree fits in memory (~1-2 GB for 200M businesses)Complex to implement correctly in an interview
No precision loss β€” exact geometryNot as cache-friendly as geohash

When to use QuadTree: When you control the full stack and have heavily non-uniform data distribution. Geohash is simpler for interviews.


Approach 6: Google S2 (Advanced, Know the Concept)

Core idea: Map the sphere (Earth) onto a cube, then use a Hilbert space-filling curve to assign a 1D integer to each 2D cell. The Hilbert curve has excellent locality β€” nearby cells have nearby integer IDs.

Sphere β†’ Cube face projection β†’ Divide face into grid β†’ Hilbert curve ordering

Hilbert curve property:
  Points that are geographically close β†’ integers that are numerically close
  This makes range queries on a 1D integer work for 2D geography!

S2 key properties:

  • Cells at 30 levels of granularity (1 cmΒ² to entire Earth)
  • Better area distortion than geohash (more uniform cell sizes)
  • Used by Google Maps, Foursquare, PokΓ©mon GO
  • More complex to implement from scratch β€” use as knowledge, not interview code

Algorithm Comparison

ApproachComplexityProsConsUse
Full scanO(n)SimpleWay too slowNever
2D indexO(n) effectiveEasyIndex merge expensiveNever
Fixed gridO(1) lookupSimpleUneven distributionNo
GeohashO(log n)Simple strings, Redis-native, easy prefix searchBoundary problem (solved by 9-cell query), cells not equal sizeYes β€” interview default
QuadTreeO(log n + k)Adaptive densityHard to update, complex to codeYes β€” if custom infra
S2O(log n)Best coverage, uniform cellsVery complexKnow concept only

Business Data: Read/Write Separation

Write path (low frequency):
  Business owner β†’ Business Service β†’ Primary DB (PostgreSQL)
                                    β†’ Invalidate / update geospatial index

Read path (high frequency):
  User search β†’ LBS β†’ Redis Geospatial Index β†’ business_ids
              β†’ Business Service β†’ Read Replica β†’ full business details

Database schema:

-- Business metadata table
CREATE TABLE businesses (
    business_id  VARCHAR(36) PRIMARY KEY,
    owner_id     VARCHAR(36) NOT NULL,
    name         VARCHAR(255) NOT NULL,
    address      TEXT,
    lat          DECIMAL(10, 8) NOT NULL,
    lng          DECIMAL(11, 8) NOT NULL,
    category     VARCHAR(50),
    rating       DECIMAL(3, 1),
    is_active    BOOLEAN DEFAULT true,
    created_at   TIMESTAMP DEFAULT NOW(),
    updated_at   TIMESTAMP DEFAULT NOW()
);
 
CREATE INDEX idx_businesses_category ON businesses(category);
CREATE INDEX idx_businesses_lat_lng ON businesses(lat, lng); -- for bounding box fallback

Caching Strategy

What to cache:

Cache LayerKeyValueTTL
Geospatial indexRedis Sorted Set (geohash_index)geohash:business_id pairsNo TTL β€” updated on write
Business detailsbiz:{business_id}Full business JSON24 hours
Search resultssearch:{geohash6}:{category}List of business_ids10 minutes
User locationuser:{user_id}:geohashCurrent geohash of user5 minutes

Cache invalidation:

When a business updates location:
  1. Update primary DB
  2. Remove old geohash entry from Redis: ZREM geohash_index "old_hash:biz_id"
  3. Add new geohash entry:             ZADD geohash_index 0 "new_hash:biz_id"
  4. Invalidate business detail cache:  DEL biz:{business_id}
  5. Invalidate affected search caches: DEL search:{old_geohash6}:*
                                        DEL search:{new_geohash6}:*

Handling Scale

LBS horizontal scaling:

LBS is stateless β†’ add more instances behind load balancer

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Load Balancer              β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚          β”‚          β”‚
  β”Œβ”€β”€β–Όβ”€β”€β”   β”Œβ”€β”€β–Όβ”€β”€β”   β”Œβ”€β”€β–Όβ”€β”€β”
  β”‚LBS 1β”‚   β”‚LBS 2β”‚   β”‚LBS 3β”‚   ... (add more on demand)
  β””β”€β”€β”¬β”€β”€β”˜   β””β”€β”€β”¬β”€β”€β”˜   β””β”€β”€β”¬β”€β”€β”˜
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   Redis Cluster β”‚   (sharded by geohash prefix)
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Redis sharding for geospatial index:

Shard by first character of geohash (16 possible first chars in Base32):
  Shard 0: geohash prefix "0" - "3"
  Shard 1: geohash prefix "4" - "7"
  Shard 2: geohash prefix "8" - "b"
  Shard 3: geohash prefix "c" - "z"

This distributes global queries evenly across shards.

Read replica strategy:

Primary DB (writes) β†’ replicate β†’ Read Replica 1
                                β†’ Read Replica 2
                                β†’ Read Replica 3

LBS always reads from replicas.
Business Service writes to primary only.
Replication lag < 1 second (acceptable for near-real-time updates).

Design Summary

Final Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Client (Mobile/Web)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚  GET /v1/search?lat=&lon=&radius=
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  Load Balancer  β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚  Search Path                     β”‚  Write Path
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Location-Based     β”‚          β”‚    Business Service         β”‚
    β”‚  Service (LBS)      β”‚          β”‚    (CRUD, Owner API)        β”‚
    β”‚  [Stateless, N pods]β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
               β”‚                                   β”‚ read/write
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Redis Cluster      β”‚          β”‚  PostgreSQL Primary         β”‚
    β”‚  Geospatial Index   │◄─────────│  (business metadata)        β”‚
    β”‚  (geohash β†’ biz_id) β”‚  sync    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚                     β”‚                        β”‚ replicate
    β”‚  Also: detail cache β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚  Read Replicas (Γ—3)         β”‚
                                     β”‚  (LBS reads details here)   β”‚
                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Decisions Summary

DecisionChoiceReasoning
Geospatial algorithmGeohashSimple strings, Redis ZRANGEBYLEX, easy neighbor lookup
Geospatial storageRedis Sorted SetFast prefix scans, in-memory, horizontally scalable
Business metadataPostgreSQL + Read ReplicasACID for writes, scale reads horizontally
LBS designStateless microserviceHorizontal scaling for high read QPS (5,800 peak)
CacheRedis (detail + search result cache)Sub-millisecond reads, TTL auto-expiry
Update strategyWrite-through + cache invalidationEventual consistency acceptable (near real-time)
Neighbor search9-cell query (center + 8 neighbors)Solves geohash boundary problem

Interview Questions & Answers

Q: Why geohash over a simple (lat, lng) index in the database?
A: A composite (lat, lng) B-tree index only helps with one dimension at a time. For a range query spanning both dimensions, the DB must merge two range scans, which is expensive. Geohash encodes 2D space into a 1D string where proximity is preserved β€” nearby locations share a common prefix. This turns a 2D spatial query into a simple string prefix scan, which Redis handles very efficiently with ZRANGEBYLEX.

Q: What is the geohash boundary problem and how do you solve it?
A: A user standing at the edge of geohash cell β€œ9q8yy” may be very close to a business in the adjacent cell β€œ9q8yz”, but a naive query on only β€œ9q8yy” would miss that business. The fix is to always query 9 cells: the user’s own cell plus all 8 immediate neighbors. After collecting all candidate business IDs, compute exact distance and filter to the requested radius. This adds a constant overhead (9Γ— the work) but guarantees no nearby business is missed.

Q: Geohash vs QuadTree β€” when would you choose each?
A: Geohash is simpler to implement, maps naturally to Redis strings, and is the right choice for most interviews and most production systems. QuadTree adapts better to highly non-uniform data distribution (e.g., ultra-dense urban areas vs sparse rural regions) and can be more memory-efficient in those cases. However QuadTree is harder to update incrementally β€” moving a business requires a subtree rebuild. Use geohash unless you have a strong reason to need the density adaptivity of a QuadTree.

Q: How do you handle a business that moves (e.g., a food truck)?
A: The update pipeline must be atomic in practice: (1) Write new lat/lng to the primary DB. (2) Remove the old geohash entry from the Redis sorted set. (3) Add the new geohash entry. (4) Invalidate any search-result caches that covered the old and new geohash cells. Replication lag to read replicas is acceptable (< 1 second). Users might get stale results for a few seconds, which is fine for a proximity service.

Q: How would you scale this to 10Γ— more users?
A: The LBS is already stateless β€” horizontally scale it by adding more pods. Redis can be clustered and sharded by geohash prefix. PostgreSQL read replicas handle the read volume. The only global bottleneck is the Redis geospatial index β€” if needed, partition it into regional clusters (US, EU, Asia) since users typically search nearby businesses and cross-region queries are rare.


Key Takeaways

  1. Geohash encodes 2D location as a 1D string β€” geographic proximity maps to string prefix proximity. This is the key insight enabling fast prefix-range searches.
  2. Always query 9 cells (center + 8 neighbors) to avoid missing businesses near cell boundaries.
  3. Separate read and write paths: LBS (reads) β†’ Redis + Read Replicas; Business Service (writes) β†’ Primary DB β†’ sync geospatial index.
  4. LBS is stateless β†’ trivially horizontally scalable to handle peak QPS (~17,400).
  5. Redis ZRANGEBYLEX is the natural fit for geohash prefix queries in production; Redis GEORADIUS is the higher-level convenience API.
  6. QuadTree is adaptive to density but harder to update; geohash is fixed-grid but simpler and battle-tested.
  7. Cache aggressively: business details (24h TTL), search results per geohash cell (10 min TTL), invalidate on any business update.


Last Updated: 2026-04-13
Status: 🟩 Interview ready β€” Geohash is the go-to answer; know the boundary problem cold!