Chapter 2: Design Nearby Friends

volume2 nearby-friends real-time websocket location

Status: 🟩 Interview ready
Difficulty: Hard
Time to complete: 55 min read + practice


Overview

Nearby Friends is a feature (e.g., Facebook Nearby Friends) that shows you which of your friends are physically close to you right now, updating in near-real-time as everyone moves. Unlike a proximity service where you query for static businesses, here the data changes constantly and must be pushed to users automatically.

Why this matters:

  • Tests real-time system design (WebSocket, Pub/Sub, fan-out)
  • Combines location tracking, social graph traversal, and streaming
  • Hard problem: fan-out at scale (billions of location updates per second)

Problem Statement

Design a nearby friends feature that:

  • Shows which of your friends are within a configurable radius (default 5 miles)
  • Updates automatically in near-real-time as you and friends move
  • Only shares location with direct friends (not strangers)
  • Respects user privacy (opt-in, opt-out, location sharing controls)

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

  • What does β€œnearby” mean? β†’ Default 5 miles, but user-configurable
  • How often do locations update? β†’ Every 30 seconds when the user is actively moving
  • Is it opt-in or opt-out? β†’ Users must opt in to share their location
  • Who can see my location? β†’ Only mutual friends (friend list is bidirectional)
  • What data is shown? β†’ Friend name, approximate distance (β€œ0.5 miles away”), last updated time
  • Does it work in background? β†’ Yes, as long as the app has location permission

Scope:

  • Track and broadcast live location of opted-in users to their friends
  • Show friends within configurable radius, sorted by distance
  • Respect friendship graph and privacy settings
  • Handle friend going offline / location going stale

Non-Functional Requirements

  • Low latency: Location updates appear within a few seconds
  • High availability: Real-time features are conspicuous when they break
  • Scalability: 1B total users, 10% active on Nearby Friends = 100M users
  • Privacy: Location data is ephemeral (no permanent history required)
  • Eventual consistency: A friend appearing 5 sec late is fine; 60 sec is not

Capacity Estimation

Users using Nearby Friends:      100M active
Location update frequency:       1 update per 30 seconds
Location update QPS:             100M / 30 β‰ˆ 3,333,333 β‰ˆ 3.3M writes/sec

Average friends per user:        400
Fan-out per update:              400 subscribers receive each location update
Total fan-out events/sec:        3.3M Γ— 400 = 1.32 billion β‰ˆ ~1.3B/sec
                                 (but most friends are not nearby β€” filter aggressively!)

Effective push notifications/sec: After filtering to within 5 miles:
                                  assume 10% friends are nearby β†’ 130M/sec
                                 (still very high β€” this is the hard part)

Location data size per update:
  user_id (8 bytes) + lat (8 bytes) + lng (8 bytes) + timestamp (8 bytes) = 32 bytes
  3.3M Γ— 32 bytes = ~100 MB/sec write throughput to location store

Key insight: The fan-out problem is the dominant challenge. Raw fan-out is ~1.3B events/sec but most notifications are for friends who are not nearby and can be filtered server-side.


Step 2: High-Level Design (10 min)

Communication Protocol: WebSocket vs Polling

Option 1: Polling (HTTP):

Client asks server: "Any location updates?" every N seconds
↓
GET /v1/friends/locations

Problems:
- Wasted requests if nothing changed (N=5 β†’ 20 req/min per user)
- At 100M users Γ— 20 req/min = 33M req/sec β€” too expensive for mostly-empty responses
- Latency = up to N seconds before seeing a friend move

Option 2: Long Polling:

Client makes request, server holds it open until update available
↓
Better than polling (no wasted cycles if nothing changes)
But: Server must maintain 100M open connections β†’ not horizontally scalable
And: Cannot push updates to multiple users simultaneously

Option 3: WebSocket βœ… (Best choice):

Persistent, bidirectional, full-duplex TCP connection
Client and server can send messages at any time
↓
Client sends:  location update every 30 sec β†’ server
Server sends:  friend location updates β†’ client (pushed, not pulled)

Advantages:
- Single long-lived connection (vs. new HTTP conn per poll)
- True push β€” no delay, no wasted requests
- Both directions on one socket (client sends location, server pushes friend updates)

API Design

WebSocket messages (JSON over WebSocket):

# Client β†’ Server: User sends location update
{
  "type": "location_update",
  "lat": 37.7749,
  "lng": -122.4194,
  "timestamp": 1713000000
}

# Server β†’ Client: Push a friend's updated location
{
  "type": "friend_location",
  "friend_id": "user_789",
  "friend_name": "Alice",
  "distance_miles": 1.3,
  "last_updated": 1713000027,
  "is_nearby": true
}

# Server β†’ Client: Friend went offline / location expired
{
  "type": "friend_offline",
  "friend_id": "user_789"
}

REST API (for initial page load and settings):

GET  /v1/friends/nearby?radius=5        # Initial friend list on app open
PUT  /v1/user/location-sharing          # Enable/disable sharing
GET  /v1/user/privacy-settings          # Get sharing preferences

High-Level Component Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Mobile Client (User A)                             β”‚
β”‚     Sends location every 30 sec via WebSocket                          β”‚
β”‚     Receives friend updates via WebSocket (server-pushed)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ WebSocket
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Load Balancer     β”‚
                    β”‚ (Layer 4 / Layer 7) β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ sticky session (same WS server)
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚        WebSocket Server Pool      β”‚
              β”‚  (WS-1)  (WS-2)  (WS-3)  ...     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚                   β”‚                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Location Cache β”‚  β”‚  Redis Pub/Sub  β”‚  β”‚  User DB         β”‚
β”‚ (Redis)        β”‚  β”‚  (per-user      β”‚  β”‚  (friend graph)  β”‚
β”‚ lat/lng + TTL  β”‚  β”‚   channels)     β”‚  β”‚  PostgreSQL      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Location Update Flow (Core Pipeline)

Step 1: User A sends GPS coordinates via WebSocket
        {type: "location_update", lat: 37.77, lng: -122.42, ts: 1713000000}

Step 2: WebSocket server (WS-1) receives update
        - Validates user is opted in
        - Stores location in Redis: SET user:A:location "37.77,-122.42,1713000000" EX 60

Step 3: WS-1 publishes to User A's channel in Redis Pub/Sub
        PUBLISH user:A:location "37.77,-122.42,1713000000"

Step 4: All WebSocket servers that have friends of A subscribed
        receive the published message via their Redis subscriber

Step 5: Each receiving WS server:
        - Looks up which of its connected clients are friends of A
        - Checks if A is within each friend's configured radius
        - If nearby: pushes friend_location update via WebSocket to that friend
        - If not nearby: skips (no message sent)

Step 3: Deep Dive (20 min)

Redis Pub/Sub for Location Fan-Out

Core design: Each user gets their own Redis Pub/Sub channel. Their friends’ WebSocket servers subscribe to that channel. When a user moves, one PUBLISH delivers the update to all their friends’ servers simultaneously.

Channels in Redis Pub/Sub:
  channel name: "user:{user_id}:location"

User A's friends are B (connected to WS-1), C (connected to WS-2), D (connected to WS-1):
  WS-1 subscribes to channel "user:A:location"  (because B and D are connected to WS-1)
  WS-2 subscribes to channel "user:A:location"  (because C is connected to WS-2)

When A moves:
  WS-server for A publishes: PUBLISH user:A:location "37.77,-122.42,ts"
  β†’ WS-1 receives message (delivers to B, D)
  β†’ WS-2 receives message (delivers to C)

Redis Pub/Sub vs Redis Streams vs Kafka:

SystemDelivery guaranteePersistenceFan-outUse case
Redis Pub/SubAt most once (no persistence)NoReal-time, all subscribersLocation updates β€” stale data has no value anyway
Redis StreamsAt least onceYesConsumer groupsWhen you need replay or audit
KafkaAt least onceYesTopic + partitionsHigh-throughput durable messaging

Why Redis Pub/Sub fits here: Location updates are ephemeral β€” if a subscriber misses one update, they’ll get the next one in 30 seconds. No persistence needed. The simplicity wins.


WebSocket Server: Responsibilities

Each WebSocket server instance manages:

State maintained per WS server:
  - Active WebSocket connections: {user_id β†’ websocket_handle}
  - Channel subscriptions:        {user_id β†’ [channel1, channel2, ...]}
                                  (subscribed to all friends' channels)

On new WebSocket connection for User A:
  1. Authenticate user
  2. Fetch friend list from User DB (or cache)
  3. Subscribe to each friend's pub/sub channel:
     for friend_id in friends_of_A:
         redis.subscribe(f"user:{friend_id}:location")
  4. Add to active connections map

On WebSocket disconnect for User A:
  1. Unsubscribe from all friend channels
  2. Remove from active connections map
  3. Mark user offline (or let location key expire via TTL)

Subscription count math:

100M users Γ— 400 friends = 40B subscriptions total
But: Each subscription is just a Redis channel subscription (a few bytes)
Each WS server handles ~50,000 connections
50,000 connections Γ— 400 friends = 20M channel subscriptions per WS server
Redis Pub/Sub channels are lightweight β€” this is feasible

Location Data Storage

User location in Redis:

Key:   user:{user_id}:location
Type:  String (or Hash)
Value: "lat,lng,timestamp"    e.g., "37.7749,-122.4194,1713000000"
TTL:   60 seconds

# Why TTL?
If a user stops sending updates (app backgrounded, phone off),
the location entry expires. The user appears as "offline" or
location is treated as stale. This prevents friends from seeing
a stale 8-hour-old location as "current."

Hash format for richer data:

HSET user:A:location lat 37.7749 lng -122.4194 ts 1713000000 accuracy 15
EXPIRE user:A:location 60

Location history (optional):

If regulatory or product requirements demand location history:
  Use Redis Sorted Set: ZADD user:A:location_history {ts} "{lat},{lng}"
  Or: Write to a time-series DB (InfluxDB, TimescaleDB)
  Or: Stream to S3/data lake for analytics

For the nearby friends feature itself, history is not needed.

Fan-Out Problem: Celebrity / High-Friend-Count Users

Problem: A user with 5,000 friends sends one location update. That update must be delivered to potentially 5,000 different WebSocket connections.

Normal user (400 friends):   400 deliveries per update  ← manageable
Power user (5,000 friends):  5,000 deliveries per update
Celebrity user (hypothetical, e.g., 100K friends): 100K deliveries per update ← dangerous

Solutions:

Option 1: Hard cap on Nearby Friends feature:

Limit Nearby Friends to users with < 1,000 friends.
Users with very large friend counts use a different "follower" model
(asymmetric) where Nearby Friends is disabled or rate-limited.

Option 2: Server-side filtering before delivery:

Do NOT push to all 400 friends' WebSocket connections.
Instead:
  1. On location update, get friend list
  2. For each friend, check their LAST KNOWN location from Redis
  3. Only push update to friends who are currently within 5 miles
     (or within some larger buffer, e.g., 10 miles)
  4. Friends who are clearly in another city do not receive the update

This dramatically reduces actual deliveries in practice
(most of your 400 Facebook friends are not within 5 miles of you).

Option 3: Client-side filtering:

Push update to all friends, let client filter.
Simpler server logic but wastes bandwidth.
Not scalable at 100M users.

Option 4: Async fan-out with message queue:

Instead of synchronous fan-out on the hot path:
  WS server β†’ Kafka β†’ Fan-out workers β†’ check nearby β†’ push via WS

Adds latency but decouples the write path from delivery.
Good for bursty users. Complicates architecture.

Recommended approach: Option 2 (server-side filtering) + Option 1 (friend count cap). This is what Facebook’s engineering blog describes.


WebSocket Server Scaling with Consistent Hashing

Problem: With N WebSocket servers, how do you ensure that when User A’s location is published, the right servers receive it?

Answer: Redis Pub/Sub handles this automatically!
  - All WS servers subscribe to Redis channels for their connected users' friends
  - When PUBLISH fires, Redis delivers to ALL subscribers across ALL servers
  - No need for consistent hashing on the WS tier (Redis does the routing)

But: For connection management, use consistent hashing to assign
users to WS servers, reducing subscription churn:
  - Hash(user_id) β†’ WS server index
  - Same user always reconnects to same WS server (if healthy)
  - On server failure: reassign affected users β†’ re-subscribe to channels

Scaling diagram:

                       Redis Pub/Sub Cluster
                       (sharded by channel)
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
WS-1 ────subscribe────►│ user:A:location  β”‚
WS-2 ────subscribe────►│ user:B:location  β”‚
WS-3 ────subscribe────►│ user:C:location  β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–²
                    PUBLISH   β”‚   (WS-1 publishes when User A moves)
WS-1 (handles User A) β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Privacy Controls

Graduated location sharing:

SettingBehaviorImplementation
Off (default)Location never sharedDo not subscribe friends; do not store location
Friends onlyAll friends see locationStandard flow described above
Close friendsOnly β€œclose friends” listFilter friend list to tagged subset before subscribing
ApproximateShow β€œabout X miles away” without exact locationServer rounds lat/lng to nearest 0.1 degree (~11km precision) before publishing
ScheduledShare only during certain hoursCheck time window before allowing location updates

Opt-out flow:

User disables Nearby Friends:
  1. DELETE user:{user_id}:location from Redis (remove live location)
  2. UNSUBSCRIBE all channel subscriptions for this user on their WS server
  3. Unsubscribe all friends' WS servers from user:{user_id}:location channel
  4. Send friend_offline event to all connected friends

Privacy-first design principles:

  • Location is opt-in, not opt-out
  • Location data has short TTL (60 seconds) β€” never permanently stored by default
  • Users control granularity (exact vs approximate)
  • Users can pause sharing without fully disabling the feature

Handling Edge Cases

User goes offline:

1. WebSocket connection closes (app killed, network drops)
2. WS server detects TCP disconnect
3. WS server:
   - Removes user from active connections
   - Unsubscribes from all friend channels
4. Location key in Redis expires via TTL (60 seconds)
5. After TTL expiry, any friend querying user's location gets nothing
6. WS server can proactively send friend_offline to all subscribed friends

Network partition / WS server crash:

1. Load balancer detects unhealthy WS server (health check fails)
2. All affected clients reconnect via WebSocket to a new WS server
3. New WS server re-subscribes to all friend channels
4. Location data in Redis survives (different tier)
5. Client re-sends current location on reconnect β†’ gap in updates ≀ 30 sec

Friend list changes:

User A adds User B as friend:
  1. Update friend graph in User DB
  2. User A's WS server subscribes to user:B:location channel
  3. User B's WS server subscribes to user:A:location channel
  4. Both start receiving each other's updates immediately

User A unfriends User B:
  1. Update friend graph in User DB
  2. User A's WS server unsubscribes from user:B:location channel
  3. User B's WS server unsubscribes from user:A:location channel

Efficient Distance Calculation

For checking β€œis friend within 5 miles?”, use Haversine formula:

import math
 
def haversine_miles(lat1, lng1, lat2, lng2):
    R = 3959  # Earth radius in miles
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lng2 - lng1)
    a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
 
# Fast pre-filter: bounding box check before Haversine
# 1 degree lat β‰ˆ 69 miles, 1 degree lng β‰ˆ 69*cos(lat) miles
def is_possibly_nearby(lat1, lng1, lat2, lng2, radius_miles=5):
    lat_diff = abs(lat1 - lat2)
    lng_diff = abs(lng1 - lng2)
    # Quick rejection: if bounding box is too large, skip Haversine
    if lat_diff > radius_miles / 69.0 * 1.5:
        return False
    if lng_diff > radius_miles / (69.0 * math.cos(math.radians(lat1))) * 1.5:
        return False
    return True  # Needs Haversine confirmation

At scale: The bounding box pre-filter eliminates the vast majority of friend-distance checks (most friends are in different cities), making Haversine calls rare.


Design Summary

Final Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Mobile Client (User A)                           β”‚
β”‚   Sends: location every 30 sec (WebSocket)                          β”‚
β”‚   Receives: friend location updates (WebSocket, server-pushed)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚ WebSocket (persistent)
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  Load Balancer  β”‚
                        β”‚  (sticky by     β”‚
                        β”‚   user_id)      β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚                           β”‚                         β”‚
  β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”                   β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”                β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”
  β”‚ WS-1  β”‚                   β”‚ WS-2  β”‚                β”‚ WS-3  β”‚  ...
  β”‚(50K   β”‚                   β”‚(50K   β”‚                β”‚(50K   β”‚
  β”‚ conns)β”‚                   β”‚ conns)β”‚                β”‚ conns)β”‚
  β””β”€β”€β”¬β”€β”€β”¬β”€β”˜                   β””β”€β”€β”¬β”€β”€β”¬β”€β”˜                β””β”€β”€β”¬β”€β”€β”¬β”€β”˜
     β”‚  β”‚                        β”‚  β”‚                      β”‚  β”‚
     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
     β”‚          subscribe/publishβ”‚  β”‚                          β”‚
     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
     β”‚  β”‚          Redis Pub/Sub Cluster                      β”‚ β”‚
     β”‚  β”‚  Channel per user: "user:{id}:location"             β”‚ β”‚
     β”‚  β”‚  WS server subscribes on behalf of connected users  β”‚ β”‚
     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
     β”‚                                                           β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚  read/write location
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   Redis Location    β”‚
              β”‚   Cache             β”‚
              β”‚   user:{id}:loc     β”‚
              β”‚   TTL: 60 seconds   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   User DB            β”‚  ← Friend graph, privacy settings
              β”‚   (PostgreSQL)       β”‚
              β”‚   + Redis cache      β”‚
              β”‚   for friend lists   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Decisions Summary

DecisionChoiceReasoning
ProtocolWebSocketBidirectional, persistent, server can push; polling wastes 33M req/sec
Fan-out mechanismRedis Pub/SubEach user has a channel; friends’ WS servers subscribe; PUBLISH delivers to all
Location storageRedis with 60s TTLEphemeral; stale location after 60s treated as offline
Friend graphPostgreSQL + Redis cachePersistence for friend list; cache for hot reads on connect
Fan-out filteringServer-side distance checkOnly push to friends who are actually nearby β€” reduces 400x fan-out to ~40x
PrivacyOpt-in, TTL, approximate modeLocation is sensitive β€” default to private, auto-expire
WS scalingConsistent hashing (user β†’ WS server)Reduces subscription churn on reconnects
WS failure recoveryReconnect + re-subscribeClient reconnects; new WS server re-fetches friend list and re-subscribes

Interview Questions & Answers

Q: Why WebSocket over HTTP polling for location updates?
A: Polling at 5-second intervals for 100M users generates 100M / 5 = 20M requests/sec, most of which return empty responses (nothing changed). This wastes server resources and adds latency. WebSocket maintains a single persistent connection per user, allowing the server to push updates the instant they are available. Both client-to-server (location update) and server-to-client (friend update) use the same connection, which is efficient and low-latency.

Q: How does Redis Pub/Sub enable the fan-out? Walk through an example.
A: When User A connects, their WebSocket server subscribes to the location channels of all of A’s friends (user:B:location, user:C:location, etc.). When User B moves and sends a location update, B’s WebSocket server publishes to user:B:location. Redis delivers this to every subscriber β€” including A’s WebSocket server (because A is B’s friend). A’s WebSocket server checks: is B within A’s 5-mile radius? If yes, push a friend_location message to A’s open WebSocket connection.

Q: How do you handle the fan-out problem for a user with thousands of friends?
A: Three complementary approaches: (1) Hard cap on friend count eligible for Nearby Friends (e.g., max 500 friends). (2) Server-side filtering β€” before publishing, pre-filter to friends who are currently in the same geographic area (same city/region based on last known location). This avoids waking up WS servers in other regions for a clearly non-nearby friend. (3) If a user has too many active nearby friends, rate-limit update frequency (e.g., push at most once per 60 sec for users with > 1000 friends nearby).

Q: How do you handle privacy β€” a user who doesn’t want to share their location?
A: Multi-layer controls: (1) Opt-in requirement β€” sharing is off by default. (2) Redis key is never written if sharing is disabled. (3) Friend subscriptions are never created. (4) When a user disables sharing mid-session: immediately delete their location key, unsubscribe all friend channels from their updates, send friend_offline to current nearby friends. (5) Approximate mode β€” publish rounded coordinates (0.1Β° β‰ˆ 11km) instead of exact GPS coordinates.

Q: How would you scale WebSocket servers as users grow?
A: WebSocket servers are stateful (they hold open connections and channel subscriptions), but they are independently scalable. Each server handles ~50K connections. At 100M users = 2,000 WebSocket servers. Use consistent hashing (hash user_id β†’ server index) so reconnecting users go back to the same server, avoiding churn in Redis subscriptions. For server failure, the load balancer health-checks and reroutes affected clients to healthy servers; they re-subscribe on reconnect. Redis Pub/Sub cluster (sharded by channel name) scales independently of WebSocket servers.


Key Takeaways

  1. WebSocket is mandatory for real-time location push β€” polling at 100M users Γ— every few seconds creates tens of millions of wasted requests per second.
  2. Redis Pub/Sub provides elegant fan-out: one PUBLISH delivers a location update to all friends’ WebSocket servers simultaneously, without the publisher needing to know which servers to contact.
  3. Location data is ephemeral β€” use short TTL (60 seconds) in Redis; stale location after TTL = user treated as offline. No long-term persistence needed for the core feature.
  4. Fan-out is the hardest scaling challenge: 400 friends Γ— 3.3M updates/sec = 1.3B fan-out events/sec raw. Server-side distance filtering reduces actual deliveries by ~90% (most friends are not nearby).
  5. WebSocket servers are stateful but independently scalable β€” consistent hashing minimizes subscription churn when users reconnect.
  6. Privacy must be designed in from the start: opt-in, ephemeral storage, granular controls (approximate mode, scheduled sharing, close-friends-only), and immediate effect on opt-out.
  7. Key difference from proximity service: Proximity service is query-based (user asks β†’ system responds). Nearby Friends is event-driven (system proactively pushes when friends move). The architecture shifts from request/response to streaming and pub/sub.


Last Updated: 2026-04-13
Status: 🟩 Interview ready β€” Know the WebSocket + Redis Pub/Sub fan-out flow end-to-end!