Chapter 13: Design a Search Autocomplete System

volume1 autocomplete trie search typeahead

Status: 🟩 Interview ready
Difficulty: Medium-Hard
Time to complete: 45 min read + practice

Overview

A search autocomplete system (also called typeahead or search suggestions) shows the top-K most popular query completions as a user types each character. Classic examples: Google search bar showing 5 suggestions as you type, Amazon product search, YouTube video search.

Why this matters:

Very common interview question (Google, Amazon, LinkedIn all ask this)
Teaches trie data structures in a practical context
Combines data pipeline design with low-latency query serving
Forces discussion of trade-offs: accuracy vs speed vs memory

Problem Statement

Design a search autocomplete system that:

Returns top 5 query suggestions as user types each character
Suggestions ranked by historical search frequency
Responds in < 100ms (feels instant to users)
Handles 10M DAU each performing ~10 searches/day
Matches from beginning of query (prefix match), not middle

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

How many suggestions to return? → Top 5
Match from beginning or middle of query? → Beginning only (prefix match, not substring)
Sort by? → Historical search frequency (popularity)
How fresh are suggestions? → Weekly update (not real-time, simplification)
Personalization? → No (same suggestions for all users, mention as extension)
Fuzzy matching? → No (exact prefix only, mention as extension)
Multi-language? → English only (mention multi-language as extension)
Scale? → 10M DAU, ~10 searches per user = 100M searches/day

Scope:

Prefix-based autocomplete (not substring, not fuzzy)
Top 5 results sorted by search frequency
Suggestions updated weekly based on historical data
English only, no personalization

Non-Functional Requirements

Low latency: < 100ms response time (fast enough to keep up with typing)
High availability: 99.9% uptime (search must always work)
Scalability: 10M DAU, 100M searches/day
Freshness: Suggestions updated at least weekly
Relevance: Show truly popular queries, filter spam/offensive content

Capacity Estimation

DAU: 10M
Searches per user per day: 10
Total searches/day: 10M × 10 = 100M/day
Peak QPS: 100M / 86400 × 2 (peak factor) ≈ 2,300 QPS

Characters per search query: avg 20 characters
Autocomplete requests per search: ~20 (one per character typed)
Total autocomplete requests: 100M × 20 = 2B/day
Autocomplete QPS: 2B / 86400 ≈ 23,000 QPS peak ≈ 50K at peak

Storage:
- 10M daily unique queries × 200 bytes (query + metadata) = 2 GB/day
- Weekly: 14 GB of new query data
- Trie for top 5B unique queries: ~100 GB (estimated)

Step 2: High-Level Design (10 min)

Two Separate Services

The autocomplete system splits into two distinct services with very different requirements:

┌─────────────────────────────────────────────────────────┐
│              Search Autocomplete System                 │
│                                                         │
│   ┌─────────────────────┐   ┌───────────────────────┐  │
│   │  Data Gathering     │   │   Query Service       │  │
│   │  Service            │   │                       │  │
│   │                     │   │  - Serve top-5 for    │  │
│   │  - Log all searches │   │    any prefix         │  │
│   │  - Aggregate counts │   │  - < 100ms response   │  │
│   │  - Update weekly    │   │  - Cache in Redis      │  │
│   │  - Build trie       │   │  - Read-heavy          │  │
│   │  - Write-heavy      │   │                       │  │
│   └─────────────────────┘   └───────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Why separate?

Data gathering is batch/offline (weekly rebuild, can be slow)
Query service is online/real-time (must respond in < 100ms)
Different scaling needs: gathering is write-heavy batch, query is read-heavy online

Data Flow Overview

User searches "twitter" → Logged
User searches "twitter bootstrap" → Logged
User searches "twitter api" → Logged
...millions of searches logged weekly...

Weekly batch job:
  1. Aggregate raw logs → count per query
  2. Filter spam/offensive content
  3. Build optimized trie from top queries
  4. Load trie into cache (Redis)

User types "tw":
  → Query Service → Redis cache → Returns ["twitter", "twitch", "twitter bootstrap", ...]

Basic Architecture

[User types "tw"]
        │
        ▼ HTTP GET /autocomplete?prefix=tw
[Load Balancer]
        │
        ▼
[Query Servers]
        │
        ├──→ [Redis Trie Cache] ─── HIT ──→ Return top-5 results
        │                           MISS
        └──────────────────────→ [Trie DB (persistent)] → Return top-5 results
                                    (fallback)

Data Gathering Pipeline:

[Search Logs]
     │
     ▼ (real-time)
[Log Aggregator / Apache Kafka]
     │
     ▼ (weekly batch)
[Spark / Hadoop MapReduce]
  - Count query frequencies
  - Filter and rank
     │
     ▼
[Trie Builder Service]
  - Build optimized trie from top-N queries
  - Serialize to key-value store
     │
     ▼
[Trie DB (HBase/DynamoDB)]
     │
     ▼ (load into cache)
[Redis Trie Cache]

Step 3: Deep Dive - Trie Data Structure (20 min)

What is a Trie?

A trie (prefix tree) is a tree data structure where each node represents a character and paths from root to nodes represent prefixes/words.

Queries: "tree", "trie", "try", "twitter", "twitch"

                root
                 │
                 t
                 │
                 r ─────────────── w
                 │                 │
           ─────────────       ────────────
           e        i   y      i           i
           │        │          │           │
           e        e         tter        tch
           │        │
         [tree]   [trie]    [try]    [twitter]  [twitch]

Node structure:
- children: map of char → child node
- is_word: bool (does a valid query end here?)
- top_queries: list of top-5 queries with this prefix  ← KEY OPTIMIZATION

Naive Trie: How to Find Top-5

Approach: Traverse to prefix node, then do DFS to find all matching queries, sort by frequency.

def get_top_k(trie, prefix, k=5):
    # Step 1: Navigate to prefix node  O(p) where p = prefix length
    node = trie.root
    for char in prefix:
        if char not in node.children:
            return []  # No matches
        node = node.children[char]
 
    # Step 2: DFS to find all queries with this prefix  O(n) where n = descendant nodes
    all_queries = []
    dfs(node, prefix, all_queries)
 
    # Step 3: Sort by frequency and return top-k  O(n log n)
    all_queries.sort(key=lambda q: q.frequency, reverse=True)
    return all_queries[:k]

Naive trie time complexity:

Navigate to prefix: O(p)
DFS to collect all matches: O(n) where n = nodes with this prefix
Sort: O(n log n)
Total: O(p + n log n) — too slow when n is large!

Optimized Trie: Store Top-K at Each Node

Key insight: Cache the top-k results at every node. No need to DFS or sort at query time.

class TrieNode:
    def __init__(self):
        self.children = {}        # char → TrieNode
        self.is_word = False
        self.frequency = 0        # frequency of this exact query
        self.top_queries = []     # Top-k queries for this PREFIX (cached!)
 
def get_top_k_OPTIMIZED(trie, prefix, k=5):
    # Step 1: Navigate to prefix node  O(p)
    node = trie.root
    for char in prefix:
        if char not in node.children:
            return []
        node = node.children[char]
 
    # Step 2: Return cached top-k (no DFS needed!)  O(1)
    return node.top_queries[:k]

Optimized trie time complexity:

Navigate to prefix: O(p) where p = prefix length
Return cached results: O(1)
Total: O(p) ≈ O(1) for practical prefix lengths (avg 20 chars)

Trade-off: Space vs Speed:

Aspect	Naive Trie	Optimized Trie
Query time	O(p + n log n)	O(p)
Build time	O(n)	O(n log k) per node
Memory	O(n)	O(n × k) — stores top-k at every node
Update	Recompute on insert	Recompute all ancestor nodes
When to use	Toy/small datasets	Production at scale

Memory cost of optimization:

1M unique prefix nodes × 5 results × 50 bytes/result = 250 MB
Very acceptable trade-off for O(1) query performance

Building the Optimized Trie

def build_trie(queries_with_frequency):
    """queries_with_frequency = list of (query_string, count)"""
    root = TrieNode()
 
    # Phase 1: Insert all queries, building the tree
    for query, freq in queries_with_frequency:
        node = root
        for char in query:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_word = True
        node.frequency = freq
 
    # Phase 2: Bottom-up computation of top-k at each node
    # (DFS post-order traversal)
    def compute_top_k(node, current_prefix):
        # Collect top-k from all children
        all_queries = []
        if node.is_word:
            all_queries.append((current_prefix, node.frequency))
        for char, child in node.children.items():
            child_top = compute_top_k(child, current_prefix + char)
            all_queries.extend(child_top)
 
        # Keep only top-k
        all_queries.sort(key=lambda x: x[1], reverse=True)
        node.top_queries = all_queries[:5]
        return node.top_queries
 
    compute_top_k(root, "")
    return root

Trie Serialization to Key-Value Store

Problem: In-memory trie is great for a single server, but:

Can’t fit 100 GB trie in one Redis instance
Can’t reload trie quickly after server restart
Can’t share trie across query servers

Solution: Serialize trie to a distributed key-value store.

Trie Serialization:
Key = prefix string
Value = top-5 queries with frequencies (JSON)

"t"   → [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"tw"  → [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twi" → [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twit"→ [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twitt"→[("twitter",9000), ("twitter api",7000), ("twitter bootstrap",5000), ...]

# Serialize trie to key-value store
def serialize_to_kv(trie_node, prefix, kv_store):
    kv_store.set(prefix, trie_node.top_queries)
    for char, child in trie_node.children.items():
        serialize_to_kv(child, prefix + char, kv_store)
 
# Query using key-value store (no in-memory trie needed!)
def query_autocomplete(prefix, kv_store):
    return kv_store.get(prefix)  # O(1) lookup!

With serialized trie:

Any query server can look up prefix → top-5 in O(1)
Redis serves as the trie cache layer
DB (HBase/DynamoDB) as persistent backup
No need to load entire trie in memory on each server

Data Gathering Pipeline (Detailed)

Step 1: Log Collection
─────────────────────
Every search query logged to Kafka with:
{ query: "twitter", user_id: 123, timestamp: 1640000000 }

Step 2: Weekly Batch Aggregation (Spark/Hadoop)
────────────────────────────────────────────────
Input: Raw search logs from past week
Process:
  - Group by query string
  - Count occurrences
  - Deduplicate (don't count same user twice for same query)
  - Apply decay factor (older queries count less)
Output: (query, weekly_count) pairs

Step 3: Frequency Calculation with Decay
─────────────────────────────────────────
new_frequency = (1 - decay_factor) × old_frequency + decay_factor × weekly_count
decay_factor = 0.1 to 0.5 (tune based on how quickly trends change)
This prevents old popular queries from dominating forever.

Step 4: Filtering
──────────────────
Remove: offensive/profane content (blocklist), spam queries (bot-generated),
        personally identifying info (SSNs, emails in queries)
Method: Blocklist filter + ML classifier for new offensive content

Step 5: Trie Builder
─────────────────────
Take top-N queries (by frequency) after filtering
Build optimized trie: insert all queries, compute top-k at each node
Serialize trie to key-value store (HBase/DynamoDB)

Step 6: Cache Refresh
──────────────────────
New trie data loaded into Redis cache
Old trie data expires (TTL-based or explicit invalidation)

Update frequency: Weekly rebuild is simple and acceptable. Why not real-time?

Trie rebuild is expensive (hours of compute)
Search trends don’t change hourly — weekly is fresh enough
Simpler architecture (no complex incremental update logic)

Query Service Architecture (Detailed)

User types "tw" in search box:
Browser sends: GET /autocomplete?prefix=tw

                     ┌──────────────────────────────┐
                     │      Query Service            │
                     │                              │
Request "tw" ───────►│  1. Check Redis Trie Cache   │
                     │     key = "tw"               │
                     │     result = ["twitter",...] │──► Return top-5 (hit!)
                     │                              │
                     │  2. Cache MISS?              │
                     │     → Query Trie DB (HBase)  │──► Return top-5 + cache result
                     │       key = "tw"             │
                     │                              │
                     └──────────────────────────────┘

Performance optimization: Client-side caching + debouncing:

// Client-side optimization
let debounceTimer;
const cache = new Map();  // In-browser cache
 
searchInput.addEventListener('input', (e) => {
    clearTimeout(debounceTimer);
    const prefix = e.target.value;
 
    // Don't query for very short prefixes
    if (prefix.length < 2) return;
 
    // Check browser cache first
    if (cache.has(prefix)) {
        showSuggestions(cache.get(prefix));
        return;
    }
 
    // Debounce: only fire after 100ms of no typing
    debounceTimer = setTimeout(() => {
        fetch(`/autocomplete?prefix=${prefix}`)
            .then(r => r.json())
            .then(results => {
                cache.set(prefix, results);
                showSuggestions(results);
            });
    }, 100);
});

Trie Sharding for Scale

Problem: Trie data is too large for single server (100 GB+).

Option 1: Shard by starting character (Simple):

Shard 1: a, b, c (queries starting with a/b/c)
Shard 2: d, e, f
...
Shard 9: v, w, x, y, z

Problems:

Uneven distribution (“s” has far more queries than “x”)
Hot shards (popular letters)

Option 2: Shard by prefix range (Better, but complex):

Analyze query distribution.
Shard by frequency-weighted prefix splits:
  Shard 1: a - am  (balanced by query volume)
  Shard 2: an - az + b
  Shard 3: c - ...
  ...

Option 3: Consistent hashing on prefix (Recommended for simplicity):

# Route prefix to shard
def get_shard(prefix):
    shard_id = hash(prefix) % NUM_SHARDS
    return shard_servers[shard_id]
 
# Query
def get_suggestions(prefix):
    shard = get_shard(prefix)
    return shard.query(prefix)

Benefits:

Even distribution if hash function is good
Easy to add shards (re-hash subset)
Consistent routing (same prefix always goes to same shard)

Note: With serialized trie in Redis, sharding is just Redis Cluster’s built-in sharding of keys — the prefix string is the key, Redis Cluster distributes automatically.

Handling Special Cases

Multi-language support:

Separate trie per language:
  - English trie: english_trie_
  - Spanish trie: spanish_trie_
  - Chinese trie: chinese_trie_ (trie on characters, not pinyin)

Route based on user's locale or browser language:
  Accept-Language: zh-CN → query chinese_trie_

Personalization (advanced extension):

Base suggestions = global trie (population-level frequencies)
Personal layer = user's own recent search history (stored per user)

Merge: top-2 personal + top-3 global
Implementation: Redis per-user hash storing recent query counts
                Blend with global trie at query time

Filtering offensive content:

Approach 1: Blocklist filter (for known bad terms)
  - Maintain list of blocked terms
  - Filter during trie build and at query time

Approach 2: ML classifier (for new offensive content)
  - Train classifier on labeled data
  - Run on new queries in data pipeline before trie build

Real-time filter: Block queries containing certain patterns (regex)

Design Summary

Complete Data Flow

┌──────────────────────────────────────────────────────────────────┐
│                     DATA GATHERING SERVICE                       │
│                                                                  │
│  [Search Requests]                                               │
│       │                                                          │
│       ▼ (every search logged)                                    │
│  [Kafka Log Stream]                                              │
│       │                                                          │
│       ▼ (weekly batch)                                           │
│  [Spark Aggregation]                                             │
│   - Count & rank queries                                         │
│   - Apply decay (old queries fade)                               │
│   - Filter spam/offensive                                        │
│       │                                                          │
│       ▼                                                          │
│  [Trie Builder]                                                  │
│   - Build optimized trie (top-5 at each node)                    │
│   - Serialize to key-value store                                 │
│       │                                                          │
│       ▼                                                          │
│  [Trie DB: HBase / DynamoDB]  ──────────────────────────────┐   │
│                                                              │   │
│       ▼ (cache load)                                         │   │
│  [Redis Trie Cache]  ◄───────────────────────────────────── │   │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼ (weekly refresh)
┌──────────────────────────────────────────────────────────────────┐
│                       QUERY SERVICE                              │
│                                                                  │
│  User types "tw"                                                 │
│       │                                                          │
│       ▼                                                          │
│  [Load Balancer]                                                 │
│       │                                                          │
│       ▼                                                          │
│  [Query Servers]                                                 │
│   1. Check Redis cache (key="tw")  ──HIT──► Return top-5        │
│   2. MISS → query Trie DB          ──────► Return top-5 + cache │
│                                                                  │
│  Response: ["twitter","twitch","twitter api","tween","twilio"]   │
└──────────────────────────────────────────────────────────────────┘

Key Design Decisions Summary

Decision	Choice	Reasoning
Data structure	Optimized trie (top-k at each node)	O(p) query time vs O(p + n log n) naive
Storage	Serialized to KV store (HBase/DynamoDB)	Can’t fit trie in one server’s memory
Cache	Redis	O(1) lookup by prefix string as key
Update frequency	Weekly batch	Trends don’t change hourly; rebuild is expensive
Update mechanism	Full rebuild (not incremental)	Simpler; avoids stale partial updates
Filtering	Blocklist + ML classifier	Remove offensive content before trie build
Sharding	Redis Cluster (auto-shards by key)	Prefix = key, Redis distributes automatically
Client optimization	Browser cache + 100ms debounce	Reduce server requests significantly

Interview Questions & Answers

Q: How does the optimized trie achieve O(p) query time?
A: By caching the top-k results at every node during trie construction. Instead of doing a DFS over all descendant nodes and sorting at query time, we precompute the top-k for each prefix during the offline build phase. At query time, we just navigate to the prefix node in O(p) steps and return the cached list in O(1). The trade-off is more memory (store k results at each of n nodes) and longer build time, but query time is O(p) ≈ O(1) in practice.

Q: Why use weekly batch updates instead of real-time trie updates?
A: Trie updates are expensive — updating a single query requires recomputing the top-k list at all ancestor nodes (O(p) nodes each requiring re-sorting). At 100M queries/day, real-time updates would cause constant trie rebuilds. Search trends also don’t change meaningfully within hours. Weekly batch is simpler, cheaper, and fresh enough. For truly real-time (like breaking news), a two-tier approach can be used: a fast approximate layer (recent query log) + the slow accurate layer (weekly trie).

Q: How do you handle a query prefix that is not in the trie?
A: If navigating the trie reaches a dead-end (character not in node’s children), return an empty list immediately in O(p). This means the prefix has no matching popular queries. We can also return a fallback: generic popular queries (trending searches), but this usually looks odd to users, so empty is cleaner. The trie only contains the top-N queries (say top 5 billion), so very rare prefixes legitimately have no suggestions.

Q: How would you add personalization to autocomplete?
A: Two-tier approach: (1) Global trie gives population-level popular suggestions. (2) Per-user layer: store each user’s recent search history (last 30 days) in Redis as a sorted set (query → count). At query time, merge personal results with global results — show up to 2 personal matches + fill remaining slots from global trie. The personal layer is small and fast. Cold users (no history) fall back entirely to global suggestions.

Q: How do you prevent offensive queries from appearing in autocomplete?
A: Multi-layered approach: (1) Blocklist filter at data pipeline level — any query containing blocked terms is removed before trie build. (2) ML classifier for novel offensive content — train on labeled data to detect new patterns not in blocklist. (3) Real-time filter at query serving — regex/blocklist check on suggestions before returning to client. (4) User reporting — flag surfaced suggestions as inappropriate, use as training data. Blocklist handles known terms instantly; ML handles novel cases.

Key Takeaways

Optimized trie stores top-k at every node, achieving O(p) query time at the cost of more memory — the right trade-off for production
Two services with different update cadences: data gathering (weekly batch) vs query serving (real-time, < 100ms)
Serialize trie to key-value store (prefix → top-k) to avoid loading giant trie in memory on every server
Redis as trie cache — prefix string is the key, top-k list is the value — simple and fast
Weekly rebuild is acceptable because search trends don’t change hourly and incremental updates are complex
Client-side debounce + cache reduces server load significantly (100ms debounce can cut requests by 50%+)
Sharding: Redis Cluster handles it automatically since prefix is the key — no manual sharding logic needed

ch04-rate-limiter - Rate limiting autocomplete API (prevent scraping)
distributed-system-components > Key-Value Store - Storage for serialized trie
distributed-system-components > Cache - Redis for trie cache layer
ch06-key-value-store - Building the storage layer
key-patterns > Read-Heavy Systems - Caching patterns for high-read workloads

Practice this design! Common Medium-Hard interview question. Be ready to:

Draw the trie and explain naive vs optimized with complexity
Explain the two-service split (gathering vs query)
Walk through how weekly batch update pipeline works
Discuss serializing trie to key-value store (the key insight)
Handle edge cases: no results, offensive content, personalization

Last Updated: 2026-04-13
Status: Interview ready - Know the trie optimization cold!

Study Notes by Niladri & AI

Explorer

ch13-search-autocomplete

Chapter 13: Design a Search Autocomplete System

Overview

Problem Statement

Step 1: Requirements & Scope (5 min)

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Step 2: High-Level Design (10 min)

Two Separate Services

Data Flow Overview

Basic Architecture

Step 3: Deep Dive - Trie Data Structure (20 min)

What is a Trie?

Naive Trie: How to Find Top-5

Optimized Trie: Store Top-K at Each Node

Building the Optimized Trie

Trie Serialization to Key-Value Store

Data Gathering Pipeline (Detailed)

Query Service Architecture (Detailed)

Trie Sharding for Scale

Handling Special Cases

Design Summary

Complete Data Flow

Key Design Decisions Summary

Interview Questions & Answers

Key Takeaways

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch13-search-autocomplete

Chapter 13: Design a Search Autocomplete System

Overview

Problem Statement

Step 1: Requirements & Scope (5 min)

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Step 2: High-Level Design (10 min)

Two Separate Services

Data Flow Overview

Basic Architecture

Step 3: Deep Dive - Trie Data Structure (20 min)

What is a Trie?

Naive Trie: How to Find Top-5

Optimized Trie: Store Top-K at Each Node

Building the Optimized Trie

Trie Serialization to Key-Value Store

Data Gathering Pipeline (Detailed)

Query Service Architecture (Detailed)

Trie Sharding for Scale

Handling Special Cases

Design Summary

Complete Data Flow

Key Design Decisions Summary

Interview Questions & Answers

Key Takeaways

Related Resources

Graph View

Table of Contents

Backlinks