Chapter 11: Design a News Feed System

volume1 news-feed social-media fanout cache

Status: 🟩 Interview ready - Very common question!
Difficulty: Medium-Hard
Time to complete: 50 min read + practice


Overview

A news feed is the personalized, constantly updating stream of posts from a user’s friends and followed accounts β€” the core feature of Facebook, Twitter, Instagram, and LinkedIn. At scale, building a fast and fresh feed for hundreds of millions of users is a significant distributed systems challenge.

Why this matters:

  • One of the most frequently asked system design questions (Meta, Twitter/X, LinkedIn)
  • Covers fanout patterns, caching strategies, and the celebrities problem
  • Teaches how social graph queries interact with read-heavy systems

Problem Statement

Design a news feed system that:

  • Lets users publish posts
  • Shows each user a personalized, reverse-chronological feed of posts from their friends
  • Handles 10M+ Daily Active Users (DAU)
  • Supports text, images, and video in posts
  • Is read-heavy (users view feed far more than they post)
  • Scales to users with up to 500 friends

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

  • Mobile + web? β†’ Both
  • What can a post contain? β†’ Text, images, videos
  • Feed sorted how? β†’ Reverse chronological (newest first; ranking can be a follow-up)
  • How many friends max? β†’ 500 (Facebook-style; Twitter has followers, not friends)
  • Is it read-heavy or write-heavy? β†’ Read-heavy (10:1 read/write ratio typical)
  • Post visibility? β†’ Public, friends only, private (simplify to friends-only for interview)

Scale:

10M   Daily Active Users (DAU)
500   Max friends per user
10:1  Read-to-write ratio

Non-Functional Requirements

  • Performance: Feed load < 200ms (users expect instant feed)
  • Availability: High availability (99.9%)
  • Eventual consistency: Slight delay in feed update is acceptable
  • Scalability: Handle traffic spikes (viral posts, celebrity activity)
  • Durability: Posts must not be lost

Step 2: High-Level Design (10 min)

Two Core Operations

A news feed system has exactly two main operations:

1. PUBLISH POST:   User creates a post β†’ stored + distributed to followers
2. RETRIEVE FEED:  User opens app β†’ gets their personalized feed

Feed Publishing Flow

User publishes post
        β”‚
        β–Ό
[Client] ──→ [API Servers] ──→ [Post Service]
                                     β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                    β”‚                β”‚
                    β–Ό                β–Ό
            [Post DB]        [Fanout Service]
          (store post)       (distribute post
                              to followers)
                                     β”‚
                                     β–Ό
                             [News Feed Cache]
                          (pre-compute feeds)
                                     β”‚
                                     β–Ό
                          [Notification Service]
                         (alert followers of new post)

Feed Retrieval Flow

User opens app
        β”‚
        β–Ό
[Client] ──→ [API Servers] ──→ [News Feed Service]
                                        β”‚
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                           β”‚            β”‚
                           β–Ό            β–Ό
                   [News Feed Cache]  [Post DB]
                   (pre-computed      (fallback,
                    feeds in Redis)    cache miss)
                           β”‚
                           β–Ό
                      [Return feed]
                   (post IDs + content)

API Design

Publish a post:

POST /v1/post
Authorization: Bearer <user_jwt>

Request:
{
  "content": "Just had the best coffee!",
  "media_urls": ["https://cdn.example.com/img/abc.jpg"],
  "visibility": "friends"
}

Response 201 Created:
{
  "post_id": "p_9876543",
  "created_at": "2026-04-13T10:00:00Z"
}

Retrieve news feed:

GET /v1/feed?user_id=123&page_token=<cursor>&limit=20
Authorization: Bearer <user_jwt>

Response 200 OK:
{
  "feed": [
    { "post_id": "p_999", "author_id": "u_456", "content": "...", "created_at": "..." },
    { "post_id": "p_888", "author_id": "u_789", "content": "...", "created_at": "..." }
  ],
  "next_page_token": "<cursor_for_next_page>"
}

Step 3: Deep Dive (25 min)

The Fanout Problem

Fanout = distributing a new post to all of a user’s followers.

If user A has 500 friends and posts something, the post must appear in 500 different feeds. At 10M DAU with constant posting activity, this is a massive write amplification challenge.

Two fundamental approaches:

Approach 1: Fanout on Write (Push Model)

How it works:

  • When user publishes a post, immediately write post ID into each follower’s feed cache
  • On feed read: simply return pre-computed feed from cache (very fast)
Post published
      β”‚
      β–Ό
Fanout Service
      β”‚
      β”œβ”€β”€β†’ Feed[follower_1] ← append post_id
      β”œβ”€β”€β†’ Feed[follower_2] ← append post_id
      β”œβ”€β”€β†’ Feed[follower_3] ← append post_id
      └──→ Feed[follower_N] ← append post_id

Feed read: O(1) β€” just read pre-computed cache

Pros:

  • βœ… Extremely fast reads (feed is pre-computed)
  • βœ… Simple read path (just cache lookup)
  • βœ… Real-time β€” feed is always fresh

Cons:

  • ❌ Celebrity problem: If a user has 10M followers, one post = 10M cache writes (extremely slow and expensive)
  • ❌ Wasted computation for inactive users (feed computed but never read)
  • ❌ Hot spots if many followers are online simultaneously

When to use: Works well for users with small follower counts


Approach 2: Fanout on Read (Pull Model)

How it works:

  • When user publishes a post, only store it in the post DB (no fanout)
  • On feed read: query friend list, fetch recent posts from each friend, merge and sort
Feed read requested
        β”‚
        β–Ό
Fetch friend list (social graph DB)
        β”‚
        β–Ό
For each friend: fetch their recent posts
        β”‚
        β–Ό
Merge + sort by timestamp
        β”‚
        β–Ό
Return feed

Pros:

  • βœ… No wasted computation (only compute when user actually reads)
  • βœ… Handles celebrities gracefully (no fanout on post)
  • βœ… Feed is always fresh (no stale cache entries)

Cons:

  • ❌ Slow reads β€” N DB queries per feed fetch (N = number of friends)
  • ❌ Expensive at read time (social graph traversal + post lookups)
  • ❌ Spiky load on post DB when many users open app simultaneously

When to use: Small scale, or for celebrity/influencer accounts


Approach 3: Hybrid Model (Facebook / Twitter Approach)

Best of both worlds:

  • Fanout on write for regular users (< ~1000 followers): Pre-compute feed
  • Fanout on read for celebrities (> ~1000 followers): Merge on demand
User posts something
        β”‚
        β”œβ”€β”€ User has ≀ 1000 followers?
        β”‚         β”‚
        β”‚         β–Ό YES
        β”‚    Fanout on write:
        β”‚    push post_id to each
        β”‚    follower's feed cache
        β”‚
        └── User has > 1000 followers (celebrity)?
                  β”‚
                  β–Ό YES
             Just store post in DB.
             No fanout. On feed read,
             follower fetches celebrity
             posts separately and merges.


Feed read for User X:
   1. Fetch pre-computed feed from cache (regular friends)
   2. For each celebrity User X follows: fetch celebrity's recent posts
   3. Merge lists, sort by timestamp
   4. Return top N results

Threshold: Typically ~1000 followers separates β€œregular user” from β€œcelebrity” (tunable).

This is the industry-standard approach used by Facebook, Twitter, Instagram.


Feed Data Model: Redis Sorted Set

Why Redis sorted set?

  • Sorted set keeps posts ordered by score (score = timestamp)
  • Constant time insert and range query
  • Built-in trim to cap feed size (e.g., keep last 500 posts)

Feed cache structure:

Key:   feed:{user_id}
Type:  Redis Sorted Set (ZSET)
Score: Unix timestamp of post
Value: post_id

Example:
feed:user_123 = {
    "post_p999": 1712920800,  ← newer (higher score = newer)
    "post_p888": 1712917200,
    "post_p777": 1712913600,
    ...
}

Redis commands:

# Add post to feed (fanout write):
ZADD feed:{user_id} {timestamp} {post_id}
 
# Read latest 20 posts (newest first):
ZREVRANGE feed:{user_id} 0 19 WITHSCORES
 
# Trim feed to last 500 posts (save memory):
ZREMRANGEBYRANK feed:{user_id} 0 -501
 
# Remove post (if deleted):
ZREM feed:{user_id} {post_id}

Why not a list?

  • List is ordered by insertion, but posts need to stay ordered by time
  • Sorted set allows merging celebrity posts with pre-computed feed by timestamp

Cache Architecture: 5 Cache Layers

Well-designed news feed systems use multiple specialized caches:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   5 Cache Layers                     β”‚
β”‚                                                      β”‚
β”‚  1. News Feed Cache   feed:{user_id} β†’ [post_ids]   β”‚
β”‚     (pre-computed feed per user, Redis sorted set)   β”‚
β”‚                                                      β”‚
β”‚  2. Content Cache     post:{post_id} β†’ {post data}  β”‚
β”‚     (post content: text, media URLs, author info)    β”‚
β”‚                                                      β”‚
β”‚  3. Social Graph Cache  friends:{user_id} β†’ [ids]   β”‚
β”‚     (friend lists, follower counts, relationship)    β”‚
β”‚                                                      β”‚
β”‚  4. Action Cache      likes:{post_id} β†’ count       β”‚
β”‚     (like counts, comment counts, share counts)      β”‚
β”‚                                                      β”‚
β”‚  5. Counter Cache     followers:{user_id} β†’ count   β”‚
β”‚     (follower count, following count per user)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cache consistency:

  • Use write-through or write-behind caching
  • TTL on feed cache (e.g., 24h) β€” stale feeds evicted naturally
  • Post DB is the source of truth; cache is a performance layer

Full System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       News Feed System                           β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                  Feed Publishing Path                      β”‚  β”‚
β”‚  β”‚                                                            β”‚  β”‚
β”‚  β”‚  [User] β†’ [API LB] β†’ [Post Service] β†’ [Post DB (MySQL)]   β”‚  β”‚
β”‚  β”‚                              β”‚                            β”‚  β”‚
β”‚  β”‚                              β–Ό                            β”‚  β”‚
β”‚  β”‚                      [Message Queue]                      β”‚  β”‚
β”‚  β”‚                      (Kafka: new post events)             β”‚  β”‚
β”‚  β”‚                              β”‚                            β”‚  β”‚
β”‚  β”‚                              β–Ό                            β”‚  β”‚
β”‚  β”‚                     [Fanout Service]                       β”‚  β”‚
β”‚  β”‚                      (Worker pool)                        β”‚  β”‚
β”‚  β”‚                              β”‚                            β”‚  β”‚
β”‚  β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚  β”‚
β”‚  β”‚              β–Ό                               β–Ό           β”‚  β”‚
β”‚  β”‚      Regular user fans              Celebrity post        β”‚  β”‚
β”‚  β”‚      β†’ ZADD to each                 β†’ Store in Post DB   β”‚  β”‚
β”‚  β”‚        follower's feed cache          only (no fanout)   β”‚  β”‚
β”‚  β”‚                                                           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                  Feed Retrieval Path                       β”‚  β”‚
β”‚  β”‚                                                            β”‚  β”‚
β”‚  β”‚  [User] β†’ [API LB] β†’ [News Feed Service]                  β”‚  β”‚
β”‚  β”‚                              β”‚                            β”‚  β”‚
β”‚  β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚  β”‚
β”‚  β”‚              β–Ό                               β–Ό           β”‚  β”‚
β”‚  β”‚     [News Feed Cache]             [Social Graph DB]       β”‚  β”‚
β”‚  β”‚     Redis ZSET                    (who do I follow        β”‚  β”‚
β”‚  β”‚     feed:{user_id}                 that is celebrity?)    β”‚  β”‚
β”‚  β”‚              β”‚                               β”‚            β”‚  β”‚
β”‚  β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚  β”‚
β”‚  β”‚                          β–Ό                               β”‚  β”‚
β”‚  β”‚                   Merge + Sort                            β”‚  β”‚
β”‚  β”‚                   (regular feed + celebrity posts)        β”‚  β”‚
β”‚  β”‚                          β”‚                               β”‚  β”‚
β”‚  β”‚                          β–Ό                               β”‚  β”‚
β”‚  β”‚                 [Content Cache]                           β”‚  β”‚
β”‚  β”‚                 Hydrate post_ids β†’ full post objects      β”‚  β”‚
β”‚  β”‚                          β”‚                               β”‚  β”‚
β”‚  β”‚                          β–Ό                               β”‚  β”‚
β”‚  β”‚                   Return feed to client                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Supporting Stores                                          β”‚ β”‚
β”‚  β”‚                                                             β”‚ β”‚
β”‚  β”‚  Post DB (MySQL/Postgres)  β†’ source of truth for posts      β”‚ β”‚
β”‚  β”‚  Social Graph DB (MySQL)   β†’ friend/follower relationships  β”‚ β”‚
β”‚  β”‚  CDN                       β†’ media (images, videos)         β”‚ β”‚
β”‚  β”‚  Object Store (S3)         β†’ raw media uploads              β”‚ β”‚
β”‚  β”‚  Notification Service      β†’ alert on new posts             β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Deep Dive: Fanout Service

Fanout worker pseudocode:

def fanout_new_post(post_id, author_id, timestamp):
    # Get author's follower count
    follower_count = social_graph.get_follower_count(author_id)
 
    if follower_count <= CELEBRITY_THRESHOLD:  # e.g., 1000
        # Fanout on write: push to all followers' caches
        followers = social_graph.get_followers(author_id)
        for follower_id in followers:
            # Add to Redis sorted set (score = timestamp)
            redis.zadd(f"feed:{follower_id}", {post_id: timestamp})
            # Trim to max feed size
            redis.zremrangebyrank(f"feed:{follower_id}", 0, -501)
    else:
        # Celebrity: skip fanout, mark as celebrity post in DB
        # Feed retrieval will pull celebrity posts at read time
        post_db.mark_celebrity_post(post_id, author_id)

Fanout at scale:

Regular user with 500 followers posts:
  β†’ 500 Redis ZADD operations
  β†’ Each ~0.1ms β†’ Total ~50ms
  β†’ Parallelized across worker pool β†’ OK

Celebrity with 10M followers posts:
  β†’ 10M Redis ZADD operations β†’ NOT OK (would take minutes)
  β†’ Solution: skip fanout entirely, merge at read time

Deep Dive: Feed Retrieval with Merging

Merging regular feed with celebrity posts:

def get_news_feed(user_id, limit=20):
    # Step 1: Get pre-computed feed from cache (regular friends)
    cached_post_ids = redis.zrevrange(f"feed:{user_id}", 0, limit*2)
 
    # Step 2: Get celebrities this user follows
    celebrities = social_graph.get_followed_celebrities(user_id)
 
    # Step 3: Fetch recent posts from each celebrity
    celebrity_posts = []
    for celebrity_id in celebrities:
        posts = post_db.get_recent_posts(celebrity_id, limit=10)
        celebrity_posts.extend(posts)
 
    # Step 4: Merge and sort by timestamp
    all_posts = list(cached_post_ids) + celebrity_posts
    all_posts.sort(key=lambda p: p.timestamp, reverse=True)
 
    # Step 5: Hydrate post IDs to full post objects (from content cache)
    feed = [content_cache.get(post_id) for post_id in all_posts[:limit]]
 
    return feed

Deep Dive: Post Visibility Rules

Three visibility levels:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Visibility  β”‚ Who sees the post in their feed?       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Public      β”‚ All followers (even non-friends)       β”‚
β”‚ Friends     β”‚ Only mutual friends (bidirectional)    β”‚
β”‚ Private     β”‚ Only the post author                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Enforcement:

  • During fanout on write: only add to feeds of eligible users
  • During fanout on read (celebrities): filter at read time before returning

Deep Dive: Feed Ranking

Pure chronological (simplest):

  • Score = Unix timestamp
  • Newest post = highest score = appears first
  • Fair, predictable, but misses relevance

Engagement-based ranking (real systems):

  • Score = f(recency, likes, comments, shares, relationship_closeness)
  • Posts from close friends rank higher than distant connections
  • Viral posts bubble up even if slightly older

Ranking formula (simplified):

def rank_score(post, user):
    recency     = 1 / (1 + hours_since_posted(post))
    engagement  = log(1 + post.likes + post.comments * 2)
    affinity    = relationship_score(user, post.author)  # 0.0 to 1.0
 
    return recency * 0.5 + engagement * 0.3 + affinity * 0.2

Implementation: Ranking is applied at read time, not fanout time (ranking signals change continuously β€” likes/comments accumulate after post).


Handling Hot Posts (Viral Content)

Problem: A post gets 1M likes in 1 hour. Every read request fetches like count from DB.

Solution: Cache like counts in Redis (Action Cache):

Key:   likes:{post_id}
Value: integer count
TTL:   None (permanent, update on every like)
 
# Increment on like:
INCR likes:{post_id}
 
# Read like count:
GET likes:{post_id}

Periodic sync: Every N minutes, flush Redis like counts to the Post DB for durability.


Graph Traversal for Friend Posts

Social graph storage:

-- Friend relationships (bidirectional)
CREATE TABLE friendships (
    user_id_1   BIGINT,
    user_id_2   BIGINT,
    created_at  TIMESTAMP,
    PRIMARY KEY (user_id_1, user_id_2)
);
 
-- Follower relationships (one-directional, Twitter-style)
CREATE TABLE follows (
    follower_id  BIGINT,
    followee_id  BIGINT,
    created_at   TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

Friend list query (cached in Social Graph Cache):

SELECT user_id_2 AS friend_id
FROM friendships
WHERE user_id_1 = :user_id
UNION
SELECT user_id_1 AS friend_id
FROM friendships
WHERE user_id_2 = :user_id;

Cache friend lists in Redis (TTL 10 min):

Key:   friends:{user_id}
Value: [friend_id_1, friend_id_2, ..., friend_id_N]

Design Summary

Final Architecture Components

ComponentTechnologyPurpose
API ServersStateless servers + Load BalancerRoute publish and retrieve requests
Post DBMySQL (sharded by post_id)Durable storage for all posts
Social Graph DBMySQLFriend/follower relationships
Fanout QueueKafkaAsync post distribution
Fanout WorkersHorizontally scaled consumersPush post_ids to follower feeds
News Feed CacheRedis Sorted Set per userPre-computed feed (fast reads)
Content CacheRedis HashMap per postPost content hydration
Social Graph CacheRedis List per userFriend lists
Action CacheRedis countersLike/comment counts
Counter CacheRedis countersFollower/following counts
CDNCloudFront / FastlyServe media (images/videos)
Object StoreAWS S3Store raw uploaded media

Key Design Decisions Summary

DecisionChoiceReasoning
Fanout modelHybrid (write for regular, read for celebrities)Balance read speed and write cost
Feed storageRedis Sorted Set (score = timestamp)O(log N) insert, O(1) range reads
Celebrity threshold~1000 followersTunable; prevents write amplification
Feed hydrationContent cache (Redis)Avoid N DB queries per feed load
Consistency modelEventual consistencySlight delay acceptable, enables scale
Media storageS3 + CDNSeparate heavy binary data from structured data

Interview Questions & Answers

Q: What is fanout on write vs fanout on read? When do you use each?
A: Fanout on write (push model) pre-computes the feed at publish time β€” fast reads but write amplification. Fanout on read (pull model) computes the feed at read time β€” avoids wasted writes but slow reads. Use hybrid: fanout on write for regular users (fast reads, manageable write cost), fanout on read for celebrities (avoids write amplification of 10M+ fan-out operations).

Q: What is the celebrity problem and how do you solve it?
A: If a celebrity with 10M followers posts, naive fanout on write would require 10M Redis ZADD operations β€” taking minutes and creating a massive write storm. Solution: skip fanout entirely for celebrities. At feed read time, fetch the celebrity’s recent posts directly and merge them into the pre-computed feed. Use a threshold (e.g., > 1000 followers) to classify users as celebrities.

Q: Why use a Redis Sorted Set for the news feed cache?
A: Sorted set (ZSET) is perfect because: 1) Score = timestamp keeps posts naturally ordered, 2) ZREVRANGE gives newest-first retrieval in O(log N + K), 3) ZADD inserts in O(log N) with automatic ordering, 4) ZREMRANGEBYRANK trims old posts to cap memory, 5) Supports merging celebrity posts by timestamp at read time. A plain list would not support efficient merging by timestamp.

Q: How do you handle a post deletion? Does it disappear from feeds?
A: Post deletion has two parts: 1) Mark post as deleted in Post DB (soft delete β€” don’t physically remove), 2) Remove post_id from all followers’ feed caches using ZREM. Option 1 (immediate): iterate all followers’ feed caches and ZREM β€” expensive for popular posts. Option 2 (lazy): mark post deleted in DB, filter deleted posts at read time when hydrating feed. Lazy deletion is preferred at scale. CDN cache invalidation needed for media.

Q: How does pagination work for the news feed?
A: Use cursor-based pagination (not offset-based). Cursor = timestamp of the oldest post in the current page. Next request: ZREVRANGEBYSCORE feed:{user_id} {cursor-1} -inf LIMIT 0 20. Why cursor not offset? Because the feed changes as new posts are inserted β€” offset-based pagination would show duplicate or skipped posts if feed updates between pages. Cursor-based gives stable pagination.

Q: How do you scale the fanout service to handle 10M DAU?
A: Use a message queue (Kafka) to decouple post publishing from fanout. Fanout workers consume from Kafka in parallel β€” each partition handled by one worker. Partition Kafka topic by user_id to ensure all posts from one user go to the same partition (preserves order). Scale worker pool horizontally based on queue lag. Each worker uses Redis pipelining to batch ZADD operations (send 500 commands in one roundtrip instead of 500 roundtrips).


Key Takeaways

  1. Two operations define a news feed system: publish post (write path) and retrieve feed (read path) β€” keep them separate
  2. Fanout on write (push model) = fast reads but write amplification; best for users with small follower counts
  3. Fanout on read (pull model) = avoids write amplification but slow reads; best for celebrities with millions of followers
  4. Hybrid model is the industry standard (Facebook, Twitter): fanout on write for regular users, fanout on read for celebrities
  5. Redis Sorted Set is the ideal data structure for feed storage β€” ordered by timestamp, efficient range queries, supports in-place merging
  6. 5 cache layers make the feed fast: news feed cache, content cache, social graph cache, action cache, counters cache
  7. Celebrity threshold (e.g., > 1000 followers) is tunable β€” finding the right threshold is a trade-off between read latency and write amplification
  8. Eventual consistency is acceptable for feeds β€” a slightly stale feed is far better than a slow one


Last Updated: 2026-04-13
Status: Very common interview question β€” must know hybrid fanout and Redis sorted set!