Chapter 2: Design Nearby Friends
volume2 nearby-friends real-time websocket location
Status: π© Interview ready
Difficulty: Hard
Time to complete: 55 min read + practice
Overview
Nearby Friends is a feature (e.g., Facebook Nearby Friends) that shows you which of your friends are physically close to you right now, updating in near-real-time as everyone moves. Unlike a proximity service where you query for static businesses, here the data changes constantly and must be pushed to users automatically.
Why this matters:
- Tests real-time system design (WebSocket, Pub/Sub, fan-out)
- Combines location tracking, social graph traversal, and streaming
- Hard problem: fan-out at scale (billions of location updates per second)
Problem Statement
Design a nearby friends feature that:
- Shows which of your friends are within a configurable radius (default 5 miles)
- Updates automatically in near-real-time as you and friends move
- Only shares location with direct friends (not strangers)
- Respects user privacy (opt-in, opt-out, location sharing controls)
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- What does βnearbyβ mean? β Default 5 miles, but user-configurable
- How often do locations update? β Every 30 seconds when the user is actively moving
- Is it opt-in or opt-out? β Users must opt in to share their location
- Who can see my location? β Only mutual friends (friend list is bidirectional)
- What data is shown? β Friend name, approximate distance (β0.5 miles awayβ), last updated time
- Does it work in background? β Yes, as long as the app has location permission
Scope:
- Track and broadcast live location of opted-in users to their friends
- Show friends within configurable radius, sorted by distance
- Respect friendship graph and privacy settings
- Handle friend going offline / location going stale
Non-Functional Requirements
- Low latency: Location updates appear within a few seconds
- High availability: Real-time features are conspicuous when they break
- Scalability: 1B total users, 10% active on Nearby Friends = 100M users
- Privacy: Location data is ephemeral (no permanent history required)
- Eventual consistency: A friend appearing 5 sec late is fine; 60 sec is not
Capacity Estimation
Users using Nearby Friends: 100M active
Location update frequency: 1 update per 30 seconds
Location update QPS: 100M / 30 β 3,333,333 β 3.3M writes/sec
Average friends per user: 400
Fan-out per update: 400 subscribers receive each location update
Total fan-out events/sec: 3.3M Γ 400 = 1.32 billion β ~1.3B/sec
(but most friends are not nearby β filter aggressively!)
Effective push notifications/sec: After filtering to within 5 miles:
assume 10% friends are nearby β 130M/sec
(still very high β this is the hard part)
Location data size per update:
user_id (8 bytes) + lat (8 bytes) + lng (8 bytes) + timestamp (8 bytes) = 32 bytes
3.3M Γ 32 bytes = ~100 MB/sec write throughput to location store
Key insight: The fan-out problem is the dominant challenge. Raw fan-out is ~1.3B events/sec but most notifications are for friends who are not nearby and can be filtered server-side.
Step 2: High-Level Design (10 min)
Communication Protocol: WebSocket vs Polling
Option 1: Polling (HTTP):
Client asks server: "Any location updates?" every N seconds
β
GET /v1/friends/locations
Problems:
- Wasted requests if nothing changed (N=5 β 20 req/min per user)
- At 100M users Γ 20 req/min = 33M req/sec β too expensive for mostly-empty responses
- Latency = up to N seconds before seeing a friend move
Option 2: Long Polling:
Client makes request, server holds it open until update available
β
Better than polling (no wasted cycles if nothing changes)
But: Server must maintain 100M open connections β not horizontally scalable
And: Cannot push updates to multiple users simultaneously
Option 3: WebSocket β (Best choice):
Persistent, bidirectional, full-duplex TCP connection
Client and server can send messages at any time
β
Client sends: location update every 30 sec β server
Server sends: friend location updates β client (pushed, not pulled)
Advantages:
- Single long-lived connection (vs. new HTTP conn per poll)
- True push β no delay, no wasted requests
- Both directions on one socket (client sends location, server pushes friend updates)
API Design
WebSocket messages (JSON over WebSocket):
# Client β Server: User sends location update
{
"type": "location_update",
"lat": 37.7749,
"lng": -122.4194,
"timestamp": 1713000000
}
# Server β Client: Push a friend's updated location
{
"type": "friend_location",
"friend_id": "user_789",
"friend_name": "Alice",
"distance_miles": 1.3,
"last_updated": 1713000027,
"is_nearby": true
}
# Server β Client: Friend went offline / location expired
{
"type": "friend_offline",
"friend_id": "user_789"
}
REST API (for initial page load and settings):
GET /v1/friends/nearby?radius=5 # Initial friend list on app open
PUT /v1/user/location-sharing # Enable/disable sharing
GET /v1/user/privacy-settings # Get sharing preferences
High-Level Component Overview
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Mobile Client (User A) β
β Sends location every 30 sec via WebSocket β
β Receives friend updates via WebSocket (server-pushed) β
βββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β WebSocket
βββββββββββΌβββββββββββ
β Load Balancer β
β (Layer 4 / Layer 7) β
βββββββββββ¬βββββββββββ
β sticky session (same WS server)
βββββββββββββββββΌβββββββββββββββββββ
β WebSocket Server Pool β
β (WS-1) (WS-2) (WS-3) ... β
βββββββββββββββββ¬βββββββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βββββββββββΌβββββββ ββββββββββΌβββββββββ βββββββββββΌβββββββββ
β Location Cache β β Redis Pub/Sub β β User DB β
β (Redis) β β (per-user β β (friend graph) β
β lat/lng + TTL β β channels) β β PostgreSQL β
ββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββ
Location Update Flow (Core Pipeline)
Step 1: User A sends GPS coordinates via WebSocket
{type: "location_update", lat: 37.77, lng: -122.42, ts: 1713000000}
Step 2: WebSocket server (WS-1) receives update
- Validates user is opted in
- Stores location in Redis: SET user:A:location "37.77,-122.42,1713000000" EX 60
Step 3: WS-1 publishes to User A's channel in Redis Pub/Sub
PUBLISH user:A:location "37.77,-122.42,1713000000"
Step 4: All WebSocket servers that have friends of A subscribed
receive the published message via their Redis subscriber
Step 5: Each receiving WS server:
- Looks up which of its connected clients are friends of A
- Checks if A is within each friend's configured radius
- If nearby: pushes friend_location update via WebSocket to that friend
- If not nearby: skips (no message sent)
Step 3: Deep Dive (20 min)
Redis Pub/Sub for Location Fan-Out
Core design: Each user gets their own Redis Pub/Sub channel. Their friendsβ WebSocket servers subscribe to that channel. When a user moves, one PUBLISH delivers the update to all their friendsβ servers simultaneously.
Channels in Redis Pub/Sub:
channel name: "user:{user_id}:location"
User A's friends are B (connected to WS-1), C (connected to WS-2), D (connected to WS-1):
WS-1 subscribes to channel "user:A:location" (because B and D are connected to WS-1)
WS-2 subscribes to channel "user:A:location" (because C is connected to WS-2)
When A moves:
WS-server for A publishes: PUBLISH user:A:location "37.77,-122.42,ts"
β WS-1 receives message (delivers to B, D)
β WS-2 receives message (delivers to C)
Redis Pub/Sub vs Redis Streams vs Kafka:
| System | Delivery guarantee | Persistence | Fan-out | Use case |
|---|---|---|---|---|
| Redis Pub/Sub | At most once (no persistence) | No | Real-time, all subscribers | Location updates β stale data has no value anyway |
| Redis Streams | At least once | Yes | Consumer groups | When you need replay or audit |
| Kafka | At least once | Yes | Topic + partitions | High-throughput durable messaging |
Why Redis Pub/Sub fits here: Location updates are ephemeral β if a subscriber misses one update, theyβll get the next one in 30 seconds. No persistence needed. The simplicity wins.
WebSocket Server: Responsibilities
Each WebSocket server instance manages:
State maintained per WS server:
- Active WebSocket connections: {user_id β websocket_handle}
- Channel subscriptions: {user_id β [channel1, channel2, ...]}
(subscribed to all friends' channels)
On new WebSocket connection for User A:
1. Authenticate user
2. Fetch friend list from User DB (or cache)
3. Subscribe to each friend's pub/sub channel:
for friend_id in friends_of_A:
redis.subscribe(f"user:{friend_id}:location")
4. Add to active connections map
On WebSocket disconnect for User A:
1. Unsubscribe from all friend channels
2. Remove from active connections map
3. Mark user offline (or let location key expire via TTL)
Subscription count math:
100M users Γ 400 friends = 40B subscriptions total
But: Each subscription is just a Redis channel subscription (a few bytes)
Each WS server handles ~50,000 connections
50,000 connections Γ 400 friends = 20M channel subscriptions per WS server
Redis Pub/Sub channels are lightweight β this is feasible
Location Data Storage
User location in Redis:
Key: user:{user_id}:location
Type: String (or Hash)
Value: "lat,lng,timestamp" e.g., "37.7749,-122.4194,1713000000"
TTL: 60 seconds
# Why TTL?
If a user stops sending updates (app backgrounded, phone off),
the location entry expires. The user appears as "offline" or
location is treated as stale. This prevents friends from seeing
a stale 8-hour-old location as "current."
Hash format for richer data:
HSET user:A:location lat 37.7749 lng -122.4194 ts 1713000000 accuracy 15
EXPIRE user:A:location 60
Location history (optional):
If regulatory or product requirements demand location history:
Use Redis Sorted Set: ZADD user:A:location_history {ts} "{lat},{lng}"
Or: Write to a time-series DB (InfluxDB, TimescaleDB)
Or: Stream to S3/data lake for analytics
For the nearby friends feature itself, history is not needed.
Fan-Out Problem: Celebrity / High-Friend-Count Users
Problem: A user with 5,000 friends sends one location update. That update must be delivered to potentially 5,000 different WebSocket connections.
Normal user (400 friends): 400 deliveries per update β manageable
Power user (5,000 friends): 5,000 deliveries per update
Celebrity user (hypothetical, e.g., 100K friends): 100K deliveries per update β dangerous
Solutions:
Option 1: Hard cap on Nearby Friends feature:
Limit Nearby Friends to users with < 1,000 friends.
Users with very large friend counts use a different "follower" model
(asymmetric) where Nearby Friends is disabled or rate-limited.
Option 2: Server-side filtering before delivery:
Do NOT push to all 400 friends' WebSocket connections.
Instead:
1. On location update, get friend list
2. For each friend, check their LAST KNOWN location from Redis
3. Only push update to friends who are currently within 5 miles
(or within some larger buffer, e.g., 10 miles)
4. Friends who are clearly in another city do not receive the update
This dramatically reduces actual deliveries in practice
(most of your 400 Facebook friends are not within 5 miles of you).
Option 3: Client-side filtering:
Push update to all friends, let client filter.
Simpler server logic but wastes bandwidth.
Not scalable at 100M users.
Option 4: Async fan-out with message queue:
Instead of synchronous fan-out on the hot path:
WS server β Kafka β Fan-out workers β check nearby β push via WS
Adds latency but decouples the write path from delivery.
Good for bursty users. Complicates architecture.
Recommended approach: Option 2 (server-side filtering) + Option 1 (friend count cap). This is what Facebookβs engineering blog describes.
WebSocket Server Scaling with Consistent Hashing
Problem: With N WebSocket servers, how do you ensure that when User Aβs location is published, the right servers receive it?
Answer: Redis Pub/Sub handles this automatically!
- All WS servers subscribe to Redis channels for their connected users' friends
- When PUBLISH fires, Redis delivers to ALL subscribers across ALL servers
- No need for consistent hashing on the WS tier (Redis does the routing)
But: For connection management, use consistent hashing to assign
users to WS servers, reducing subscription churn:
- Hash(user_id) β WS server index
- Same user always reconnects to same WS server (if healthy)
- On server failure: reassign affected users β re-subscribe to channels
Scaling diagram:
Redis Pub/Sub Cluster
(sharded by channel)
ββββββββββββββββββββ
WS-1 ββββsubscribeβββββΊβ user:A:location β
WS-2 ββββsubscribeβββββΊβ user:B:location β
WS-3 ββββsubscribeβββββΊβ user:C:location β
ββββββββββββββββββββ
β²
PUBLISH β (WS-1 publishes when User A moves)
WS-1 (handles User A) βββββββββ
Privacy Controls
Graduated location sharing:
| Setting | Behavior | Implementation |
|---|---|---|
| Off (default) | Location never shared | Do not subscribe friends; do not store location |
| Friends only | All friends see location | Standard flow described above |
| Close friends | Only βclose friendsβ list | Filter friend list to tagged subset before subscribing |
| Approximate | Show βabout X miles awayβ without exact location | Server rounds lat/lng to nearest 0.1 degree (~11km precision) before publishing |
| Scheduled | Share only during certain hours | Check time window before allowing location updates |
Opt-out flow:
User disables Nearby Friends:
1. DELETE user:{user_id}:location from Redis (remove live location)
2. UNSUBSCRIBE all channel subscriptions for this user on their WS server
3. Unsubscribe all friends' WS servers from user:{user_id}:location channel
4. Send friend_offline event to all connected friends
Privacy-first design principles:
- Location is opt-in, not opt-out
- Location data has short TTL (60 seconds) β never permanently stored by default
- Users control granularity (exact vs approximate)
- Users can pause sharing without fully disabling the feature
Handling Edge Cases
User goes offline:
1. WebSocket connection closes (app killed, network drops)
2. WS server detects TCP disconnect
3. WS server:
- Removes user from active connections
- Unsubscribes from all friend channels
4. Location key in Redis expires via TTL (60 seconds)
5. After TTL expiry, any friend querying user's location gets nothing
6. WS server can proactively send friend_offline to all subscribed friends
Network partition / WS server crash:
1. Load balancer detects unhealthy WS server (health check fails)
2. All affected clients reconnect via WebSocket to a new WS server
3. New WS server re-subscribes to all friend channels
4. Location data in Redis survives (different tier)
5. Client re-sends current location on reconnect β gap in updates β€ 30 sec
Friend list changes:
User A adds User B as friend:
1. Update friend graph in User DB
2. User A's WS server subscribes to user:B:location channel
3. User B's WS server subscribes to user:A:location channel
4. Both start receiving each other's updates immediately
User A unfriends User B:
1. Update friend graph in User DB
2. User A's WS server unsubscribes from user:B:location channel
3. User B's WS server unsubscribes from user:A:location channel
Efficient Distance Calculation
For checking βis friend within 5 miles?β, use Haversine formula:
import math
def haversine_miles(lat1, lng1, lat2, lng2):
R = 3959 # Earth radius in miles
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlambda = math.radians(lng2 - lng1)
a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
# Fast pre-filter: bounding box check before Haversine
# 1 degree lat β 69 miles, 1 degree lng β 69*cos(lat) miles
def is_possibly_nearby(lat1, lng1, lat2, lng2, radius_miles=5):
lat_diff = abs(lat1 - lat2)
lng_diff = abs(lng1 - lng2)
# Quick rejection: if bounding box is too large, skip Haversine
if lat_diff > radius_miles / 69.0 * 1.5:
return False
if lng_diff > radius_miles / (69.0 * math.cos(math.radians(lat1))) * 1.5:
return False
return True # Needs Haversine confirmationAt scale: The bounding box pre-filter eliminates the vast majority of friend-distance checks (most friends are in different cities), making Haversine calls rare.
Design Summary
Final Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Mobile Client (User A) β
β Sends: location every 30 sec (WebSocket) β
β Receives: friend location updates (WebSocket, server-pushed) β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β WebSocket (persistent)
ββββββββββΌβββββββββ
β Load Balancer β
β (sticky by β
β user_id) β
ββββββββββ¬βββββββββ
β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
ββββΌβββββ ββββΌβββββ ββββΌβββββ
β WS-1 β β WS-2 β β WS-3 β ...
β(50K β β(50K β β(50K β
β conns)β β conns)β β conns)β
ββββ¬βββ¬ββ ββββ¬βββ¬ββ ββββ¬βββ¬ββ
β β β β β β
β ββββββββββββββββββββββββββΌβββΌβββββββββββββββββββββββ β
β subscribe/publishβ β β
β ββββββββββββββββββββββββββΌβββΌβββββββββββββββββββββββββ β
β β Redis Pub/Sub Cluster β β
β β Channel per user: "user:{id}:location" β β
β β WS server subscribes on behalf of connected users β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β read/write location
βββββββββββΌβββββββββββ
β Redis Location β
β Cache β
β user:{id}:loc β
β TTL: 60 seconds β
βββββββββββββββββββββββ
βββββββββββββββββββββββ
β User DB β β Friend graph, privacy settings
β (PostgreSQL) β
β + Redis cache β
β for friend lists β
βββββββββββββββββββββββ
Key Decisions Summary
| Decision | Choice | Reasoning |
|---|---|---|
| Protocol | WebSocket | Bidirectional, persistent, server can push; polling wastes 33M req/sec |
| Fan-out mechanism | Redis Pub/Sub | Each user has a channel; friendsβ WS servers subscribe; PUBLISH delivers to all |
| Location storage | Redis with 60s TTL | Ephemeral; stale location after 60s treated as offline |
| Friend graph | PostgreSQL + Redis cache | Persistence for friend list; cache for hot reads on connect |
| Fan-out filtering | Server-side distance check | Only push to friends who are actually nearby β reduces 400x fan-out to ~40x |
| Privacy | Opt-in, TTL, approximate mode | Location is sensitive β default to private, auto-expire |
| WS scaling | Consistent hashing (user β WS server) | Reduces subscription churn on reconnects |
| WS failure recovery | Reconnect + re-subscribe | Client reconnects; new WS server re-fetches friend list and re-subscribes |
Interview Questions & Answers
Q: Why WebSocket over HTTP polling for location updates?
A: Polling at 5-second intervals for 100M users generates 100M / 5 = 20M requests/sec, most of which return empty responses (nothing changed). This wastes server resources and adds latency. WebSocket maintains a single persistent connection per user, allowing the server to push updates the instant they are available. Both client-to-server (location update) and server-to-client (friend update) use the same connection, which is efficient and low-latency.
Q: How does Redis Pub/Sub enable the fan-out? Walk through an example.
A: When User A connects, their WebSocket server subscribes to the location channels of all of Aβs friends (user:B:location, user:C:location, etc.). When User B moves and sends a location update, Bβs WebSocket server publishes to user:B:location. Redis delivers this to every subscriber β including Aβs WebSocket server (because A is Bβs friend). Aβs WebSocket server checks: is B within Aβs 5-mile radius? If yes, push a friend_location message to Aβs open WebSocket connection.
Q: How do you handle the fan-out problem for a user with thousands of friends?
A: Three complementary approaches: (1) Hard cap on friend count eligible for Nearby Friends (e.g., max 500 friends). (2) Server-side filtering β before publishing, pre-filter to friends who are currently in the same geographic area (same city/region based on last known location). This avoids waking up WS servers in other regions for a clearly non-nearby friend. (3) If a user has too many active nearby friends, rate-limit update frequency (e.g., push at most once per 60 sec for users with > 1000 friends nearby).
Q: How do you handle privacy β a user who doesnβt want to share their location?
A: Multi-layer controls: (1) Opt-in requirement β sharing is off by default. (2) Redis key is never written if sharing is disabled. (3) Friend subscriptions are never created. (4) When a user disables sharing mid-session: immediately delete their location key, unsubscribe all friend channels from their updates, send friend_offline to current nearby friends. (5) Approximate mode β publish rounded coordinates (0.1Β° β 11km) instead of exact GPS coordinates.
Q: How would you scale WebSocket servers as users grow?
A: WebSocket servers are stateful (they hold open connections and channel subscriptions), but they are independently scalable. Each server handles ~50K connections. At 100M users = 2,000 WebSocket servers. Use consistent hashing (hash user_id β server index) so reconnecting users go back to the same server, avoiding churn in Redis subscriptions. For server failure, the load balancer health-checks and reroutes affected clients to healthy servers; they re-subscribe on reconnect. Redis Pub/Sub cluster (sharded by channel name) scales independently of WebSocket servers.
Key Takeaways
- WebSocket is mandatory for real-time location push β polling at 100M users Γ every few seconds creates tens of millions of wasted requests per second.
- Redis Pub/Sub provides elegant fan-out: one PUBLISH delivers a location update to all friendsβ WebSocket servers simultaneously, without the publisher needing to know which servers to contact.
- Location data is ephemeral β use short TTL (60 seconds) in Redis; stale location after TTL = user treated as offline. No long-term persistence needed for the core feature.
- Fan-out is the hardest scaling challenge: 400 friends Γ 3.3M updates/sec = 1.3B fan-out events/sec raw. Server-side distance filtering reduces actual deliveries by ~90% (most friends are not nearby).
- WebSocket servers are stateful but independently scalable β consistent hashing minimizes subscription churn when users reconnect.
- Privacy must be designed in from the start: opt-in, ephemeral storage, granular controls (approximate mode, scheduled sharing, close-friends-only), and immediate effect on opt-out.
- Key difference from proximity service: Proximity service is query-based (user asks β system responds). Nearby Friends is event-driven (system proactively pushes when friends move). The architecture shifts from request/response to streaming and pub/sub.
Related Resources
- ch01-proximity-service β Query-based geospatial search (contrast with this push-based design)
- ch05-consistent-hashing β Used for WebSocket server assignment
- Sub β Fan-out mechanism deep dive
- key-patterns > WebSocket Scaling β Managing stateful connections at scale
Last Updated: 2026-04-13
Status: π© Interview ready β Know the WebSocket + Redis Pub/Sub fan-out flow end-to-end!