Chapter 13: Design a Search Autocomplete System
volume1 autocomplete trie search typeahead
Status: π© Interview ready
Difficulty: Medium-Hard
Time to complete: 45 min read + practice
Overview
A search autocomplete system (also called typeahead or search suggestions) shows the top-K most popular query completions as a user types each character. Classic examples: Google search bar showing 5 suggestions as you type, Amazon product search, YouTube video search.
Why this matters:
- Very common interview question (Google, Amazon, LinkedIn all ask this)
- Teaches trie data structures in a practical context
- Combines data pipeline design with low-latency query serving
- Forces discussion of trade-offs: accuracy vs speed vs memory
Problem Statement
Design a search autocomplete system that:
- Returns top 5 query suggestions as user types each character
- Suggestions ranked by historical search frequency
- Responds in < 100ms (feels instant to users)
- Handles 10M DAU each performing ~10 searches/day
- Matches from beginning of query (prefix match), not middle
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- How many suggestions to return? β Top 5
- Match from beginning or middle of query? β Beginning only (prefix match, not substring)
- Sort by? β Historical search frequency (popularity)
- How fresh are suggestions? β Weekly update (not real-time, simplification)
- Personalization? β No (same suggestions for all users, mention as extension)
- Fuzzy matching? β No (exact prefix only, mention as extension)
- Multi-language? β English only (mention multi-language as extension)
- Scale? β 10M DAU, ~10 searches per user = 100M searches/day
Scope:
- Prefix-based autocomplete (not substring, not fuzzy)
- Top 5 results sorted by search frequency
- Suggestions updated weekly based on historical data
- English only, no personalization
Non-Functional Requirements
- Low latency: < 100ms response time (fast enough to keep up with typing)
- High availability: 99.9% uptime (search must always work)
- Scalability: 10M DAU, 100M searches/day
- Freshness: Suggestions updated at least weekly
- Relevance: Show truly popular queries, filter spam/offensive content
Capacity Estimation
DAU: 10M
Searches per user per day: 10
Total searches/day: 10M Γ 10 = 100M/day
Peak QPS: 100M / 86400 Γ 2 (peak factor) β 2,300 QPS
Characters per search query: avg 20 characters
Autocomplete requests per search: ~20 (one per character typed)
Total autocomplete requests: 100M Γ 20 = 2B/day
Autocomplete QPS: 2B / 86400 β 23,000 QPS peak β 50K at peak
Storage:
- 10M daily unique queries Γ 200 bytes (query + metadata) = 2 GB/day
- Weekly: 14 GB of new query data
- Trie for top 5B unique queries: ~100 GB (estimated)
Step 2: High-Level Design (10 min)
Two Separate Services
The autocomplete system splits into two distinct services with very different requirements:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Search Autocomplete System β
β β
β βββββββββββββββββββββββ βββββββββββββββββββββββββ β
β β Data Gathering β β Query Service β β
β β Service β β β β
β β β β - Serve top-5 for β β
β β - Log all searches β β any prefix β β
β β - Aggregate counts β β - < 100ms response β β
β β - Update weekly β β - Cache in Redis β β
β β - Build trie β β - Read-heavy β β
β β - Write-heavy β β β β
β βββββββββββββββββββββββ βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why separate?
- Data gathering is batch/offline (weekly rebuild, can be slow)
- Query service is online/real-time (must respond in < 100ms)
- Different scaling needs: gathering is write-heavy batch, query is read-heavy online
Data Flow Overview
User searches "twitter" β Logged
User searches "twitter bootstrap" β Logged
User searches "twitter api" β Logged
...millions of searches logged weekly...
Weekly batch job:
1. Aggregate raw logs β count per query
2. Filter spam/offensive content
3. Build optimized trie from top queries
4. Load trie into cache (Redis)
User types "tw":
β Query Service β Redis cache β Returns ["twitter", "twitch", "twitter bootstrap", ...]
Basic Architecture
[User types "tw"]
β
βΌ HTTP GET /autocomplete?prefix=tw
[Load Balancer]
β
βΌ
[Query Servers]
β
ββββ [Redis Trie Cache] βββ HIT βββ Return top-5 results
β MISS
ββββββββββββββββββββββββ [Trie DB (persistent)] β Return top-5 results
(fallback)
Data Gathering Pipeline:
[Search Logs]
β
βΌ (real-time)
[Log Aggregator / Apache Kafka]
β
βΌ (weekly batch)
[Spark / Hadoop MapReduce]
- Count query frequencies
- Filter and rank
β
βΌ
[Trie Builder Service]
- Build optimized trie from top-N queries
- Serialize to key-value store
β
βΌ
[Trie DB (HBase/DynamoDB)]
β
βΌ (load into cache)
[Redis Trie Cache]
Step 3: Deep Dive - Trie Data Structure (20 min)
What is a Trie?
A trie (prefix tree) is a tree data structure where each node represents a character and paths from root to nodes represent prefixes/words.
Queries: "tree", "trie", "try", "twitter", "twitch"
root
β
t
β
r βββββββββββββββ w
β β
βββββββββββββ ββββββββββββ
e i y i i
β β β β
e e tter tch
β β
[tree] [trie] [try] [twitter] [twitch]
Node structure:
- children: map of char β child node
- is_word: bool (does a valid query end here?)
- top_queries: list of top-5 queries with this prefix β KEY OPTIMIZATION
Naive Trie: How to Find Top-5
Approach: Traverse to prefix node, then do DFS to find all matching queries, sort by frequency.
def get_top_k(trie, prefix, k=5):
# Step 1: Navigate to prefix node O(p) where p = prefix length
node = trie.root
for char in prefix:
if char not in node.children:
return [] # No matches
node = node.children[char]
# Step 2: DFS to find all queries with this prefix O(n) where n = descendant nodes
all_queries = []
dfs(node, prefix, all_queries)
# Step 3: Sort by frequency and return top-k O(n log n)
all_queries.sort(key=lambda q: q.frequency, reverse=True)
return all_queries[:k]Naive trie time complexity:
- Navigate to prefix: O(p)
- DFS to collect all matches: O(n) where n = nodes with this prefix
- Sort: O(n log n)
- Total: O(p + n log n) β too slow when n is large!
Optimized Trie: Store Top-K at Each Node
Key insight: Cache the top-k results at every node. No need to DFS or sort at query time.
class TrieNode:
def __init__(self):
self.children = {} # char β TrieNode
self.is_word = False
self.frequency = 0 # frequency of this exact query
self.top_queries = [] # Top-k queries for this PREFIX (cached!)
def get_top_k_OPTIMIZED(trie, prefix, k=5):
# Step 1: Navigate to prefix node O(p)
node = trie.root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
# Step 2: Return cached top-k (no DFS needed!) O(1)
return node.top_queries[:k]Optimized trie time complexity:
- Navigate to prefix: O(p) where p = prefix length
- Return cached results: O(1)
- Total: O(p) β O(1) for practical prefix lengths (avg 20 chars)
Trade-off: Space vs Speed:
| Aspect | Naive Trie | Optimized Trie |
|---|---|---|
| Query time | O(p + n log n) | O(p) |
| Build time | O(n) | O(n log k) per node |
| Memory | O(n) | O(n Γ k) β stores top-k at every node |
| Update | Recompute on insert | Recompute all ancestor nodes |
| When to use | Toy/small datasets | Production at scale |
Memory cost of optimization:
- 1M unique prefix nodes Γ 5 results Γ 50 bytes/result = 250 MB
- Very acceptable trade-off for O(1) query performance
Building the Optimized Trie
def build_trie(queries_with_frequency):
"""queries_with_frequency = list of (query_string, count)"""
root = TrieNode()
# Phase 1: Insert all queries, building the tree
for query, freq in queries_with_frequency:
node = root
for char in query:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_word = True
node.frequency = freq
# Phase 2: Bottom-up computation of top-k at each node
# (DFS post-order traversal)
def compute_top_k(node, current_prefix):
# Collect top-k from all children
all_queries = []
if node.is_word:
all_queries.append((current_prefix, node.frequency))
for char, child in node.children.items():
child_top = compute_top_k(child, current_prefix + char)
all_queries.extend(child_top)
# Keep only top-k
all_queries.sort(key=lambda x: x[1], reverse=True)
node.top_queries = all_queries[:5]
return node.top_queries
compute_top_k(root, "")
return rootTrie Serialization to Key-Value Store
Problem: In-memory trie is great for a single server, but:
- Canβt fit 100 GB trie in one Redis instance
- Canβt reload trie quickly after server restart
- Canβt share trie across query servers
Solution: Serialize trie to a distributed key-value store.
Trie Serialization:
Key = prefix string
Value = top-5 queries with frequencies (JSON)
"t" β [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"tw" β [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twi" β [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twit"β [("twitter",9000), ("twitch",8500), ("twitter api",7000), ...]
"twitt"β[("twitter",9000), ("twitter api",7000), ("twitter bootstrap",5000), ...]
# Serialize trie to key-value store
def serialize_to_kv(trie_node, prefix, kv_store):
kv_store.set(prefix, trie_node.top_queries)
for char, child in trie_node.children.items():
serialize_to_kv(child, prefix + char, kv_store)
# Query using key-value store (no in-memory trie needed!)
def query_autocomplete(prefix, kv_store):
return kv_store.get(prefix) # O(1) lookup!With serialized trie:
- Any query server can look up prefix β top-5 in O(1)
- Redis serves as the trie cache layer
- DB (HBase/DynamoDB) as persistent backup
- No need to load entire trie in memory on each server
Data Gathering Pipeline (Detailed)
Step 1: Log Collection
βββββββββββββββββββββ
Every search query logged to Kafka with:
{ query: "twitter", user_id: 123, timestamp: 1640000000 }
Step 2: Weekly Batch Aggregation (Spark/Hadoop)
ββββββββββββββββββββββββββββββββββββββββββββββββ
Input: Raw search logs from past week
Process:
- Group by query string
- Count occurrences
- Deduplicate (don't count same user twice for same query)
- Apply decay factor (older queries count less)
Output: (query, weekly_count) pairs
Step 3: Frequency Calculation with Decay
βββββββββββββββββββββββββββββββββββββββββ
new_frequency = (1 - decay_factor) Γ old_frequency + decay_factor Γ weekly_count
decay_factor = 0.1 to 0.5 (tune based on how quickly trends change)
This prevents old popular queries from dominating forever.
Step 4: Filtering
ββββββββββββββββββ
Remove: offensive/profane content (blocklist), spam queries (bot-generated),
personally identifying info (SSNs, emails in queries)
Method: Blocklist filter + ML classifier for new offensive content
Step 5: Trie Builder
βββββββββββββββββββββ
Take top-N queries (by frequency) after filtering
Build optimized trie: insert all queries, compute top-k at each node
Serialize trie to key-value store (HBase/DynamoDB)
Step 6: Cache Refresh
ββββββββββββββββββββββ
New trie data loaded into Redis cache
Old trie data expires (TTL-based or explicit invalidation)
Update frequency: Weekly rebuild is simple and acceptable. Why not real-time?
- Trie rebuild is expensive (hours of compute)
- Search trends donβt change hourly β weekly is fresh enough
- Simpler architecture (no complex incremental update logic)
Query Service Architecture (Detailed)
User types "tw" in search box:
Browser sends: GET /autocomplete?prefix=tw
ββββββββββββββββββββββββββββββββ
β Query Service β
β β
Request "tw" ββββββββΊβ 1. Check Redis Trie Cache β
β key = "tw" β
β result = ["twitter",...] ββββΊ Return top-5 (hit!)
β β
β 2. Cache MISS? β
β β Query Trie DB (HBase) ββββΊ Return top-5 + cache result
β key = "tw" β
β β
ββββββββββββββββββββββββββββββββ
Performance optimization: Client-side caching + debouncing:
// Client-side optimization
let debounceTimer;
const cache = new Map(); // In-browser cache
searchInput.addEventListener('input', (e) => {
clearTimeout(debounceTimer);
const prefix = e.target.value;
// Don't query for very short prefixes
if (prefix.length < 2) return;
// Check browser cache first
if (cache.has(prefix)) {
showSuggestions(cache.get(prefix));
return;
}
// Debounce: only fire after 100ms of no typing
debounceTimer = setTimeout(() => {
fetch(`/autocomplete?prefix=${prefix}`)
.then(r => r.json())
.then(results => {
cache.set(prefix, results);
showSuggestions(results);
});
}, 100);
});Trie Sharding for Scale
Problem: Trie data is too large for single server (100 GB+).
Option 1: Shard by starting character (Simple):
Shard 1: a, b, c (queries starting with a/b/c)
Shard 2: d, e, f
...
Shard 9: v, w, x, y, z
Problems:
- Uneven distribution (βsβ has far more queries than βxβ)
- Hot shards (popular letters)
Option 2: Shard by prefix range (Better, but complex):
Analyze query distribution.
Shard by frequency-weighted prefix splits:
Shard 1: a - am (balanced by query volume)
Shard 2: an - az + b
Shard 3: c - ...
...
Option 3: Consistent hashing on prefix (Recommended for simplicity):
# Route prefix to shard
def get_shard(prefix):
shard_id = hash(prefix) % NUM_SHARDS
return shard_servers[shard_id]
# Query
def get_suggestions(prefix):
shard = get_shard(prefix)
return shard.query(prefix)Benefits:
- Even distribution if hash function is good
- Easy to add shards (re-hash subset)
- Consistent routing (same prefix always goes to same shard)
Note: With serialized trie in Redis, sharding is just Redis Clusterβs built-in sharding of keys β the prefix string is the key, Redis Cluster distributes automatically.
Handling Special Cases
Multi-language support:
Separate trie per language:
- English trie: english_trie_
- Spanish trie: spanish_trie_
- Chinese trie: chinese_trie_ (trie on characters, not pinyin)
Route based on user's locale or browser language:
Accept-Language: zh-CN β query chinese_trie_
Personalization (advanced extension):
Base suggestions = global trie (population-level frequencies)
Personal layer = user's own recent search history (stored per user)
Merge: top-2 personal + top-3 global
Implementation: Redis per-user hash storing recent query counts
Blend with global trie at query time
Filtering offensive content:
Approach 1: Blocklist filter (for known bad terms)
- Maintain list of blocked terms
- Filter during trie build and at query time
Approach 2: ML classifier (for new offensive content)
- Train classifier on labeled data
- Run on new queries in data pipeline before trie build
Real-time filter: Block queries containing certain patterns (regex)
Design Summary
Complete Data Flow
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA GATHERING SERVICE β
β β
β [Search Requests] β
β β β
β βΌ (every search logged) β
β [Kafka Log Stream] β
β β β
β βΌ (weekly batch) β
β [Spark Aggregation] β
β - Count & rank queries β
β - Apply decay (old queries fade) β
β - Filter spam/offensive β
β β β
β βΌ β
β [Trie Builder] β
β - Build optimized trie (top-5 at each node) β
β - Serialize to key-value store β
β β β
β βΌ β
β [Trie DB: HBase / DynamoDB] βββββββββββββββββββββββββββββββ β
β β β
β βΌ (cache load) β β
β [Redis Trie Cache] ββββββββββββββββββββββββββββββββββββββ β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ (weekly refresh)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY SERVICE β
β β
β User types "tw" β
β β β
β βΌ β
β [Load Balancer] β
β β β
β βΌ β
β [Query Servers] β
β 1. Check Redis cache (key="tw") ββHITβββΊ Return top-5 β
β 2. MISS β query Trie DB βββββββΊ Return top-5 + cache β
β β
β Response: ["twitter","twitch","twitter api","tween","twilio"] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Design Decisions Summary
| Decision | Choice | Reasoning |
|---|---|---|
| Data structure | Optimized trie (top-k at each node) | O(p) query time vs O(p + n log n) naive |
| Storage | Serialized to KV store (HBase/DynamoDB) | Canβt fit trie in one serverβs memory |
| Cache | Redis | O(1) lookup by prefix string as key |
| Update frequency | Weekly batch | Trends donβt change hourly; rebuild is expensive |
| Update mechanism | Full rebuild (not incremental) | Simpler; avoids stale partial updates |
| Filtering | Blocklist + ML classifier | Remove offensive content before trie build |
| Sharding | Redis Cluster (auto-shards by key) | Prefix = key, Redis distributes automatically |
| Client optimization | Browser cache + 100ms debounce | Reduce server requests significantly |
Interview Questions & Answers
Q: How does the optimized trie achieve O(p) query time?
A: By caching the top-k results at every node during trie construction. Instead of doing a DFS over all descendant nodes and sorting at query time, we precompute the top-k for each prefix during the offline build phase. At query time, we just navigate to the prefix node in O(p) steps and return the cached list in O(1). The trade-off is more memory (store k results at each of n nodes) and longer build time, but query time is O(p) β O(1) in practice.
Q: Why use weekly batch updates instead of real-time trie updates?
A: Trie updates are expensive β updating a single query requires recomputing the top-k list at all ancestor nodes (O(p) nodes each requiring re-sorting). At 100M queries/day, real-time updates would cause constant trie rebuilds. Search trends also donβt change meaningfully within hours. Weekly batch is simpler, cheaper, and fresh enough. For truly real-time (like breaking news), a two-tier approach can be used: a fast approximate layer (recent query log) + the slow accurate layer (weekly trie).
Q: How do you handle a query prefix that is not in the trie?
A: If navigating the trie reaches a dead-end (character not in nodeβs children), return an empty list immediately in O(p). This means the prefix has no matching popular queries. We can also return a fallback: generic popular queries (trending searches), but this usually looks odd to users, so empty is cleaner. The trie only contains the top-N queries (say top 5 billion), so very rare prefixes legitimately have no suggestions.
Q: How would you add personalization to autocomplete?
A: Two-tier approach: (1) Global trie gives population-level popular suggestions. (2) Per-user layer: store each userβs recent search history (last 30 days) in Redis as a sorted set (query β count). At query time, merge personal results with global results β show up to 2 personal matches + fill remaining slots from global trie. The personal layer is small and fast. Cold users (no history) fall back entirely to global suggestions.
Q: How do you prevent offensive queries from appearing in autocomplete?
A: Multi-layered approach: (1) Blocklist filter at data pipeline level β any query containing blocked terms is removed before trie build. (2) ML classifier for novel offensive content β train on labeled data to detect new patterns not in blocklist. (3) Real-time filter at query serving β regex/blocklist check on suggestions before returning to client. (4) User reporting β flag surfaced suggestions as inappropriate, use as training data. Blocklist handles known terms instantly; ML handles novel cases.
Key Takeaways
- Optimized trie stores top-k at every node, achieving O(p) query time at the cost of more memory β the right trade-off for production
- Two services with different update cadences: data gathering (weekly batch) vs query serving (real-time, < 100ms)
- Serialize trie to key-value store (prefix β top-k) to avoid loading giant trie in memory on every server
- Redis as trie cache β prefix string is the key, top-k list is the value β simple and fast
- Weekly rebuild is acceptable because search trends donβt change hourly and incremental updates are complex
- Client-side debounce + cache reduces server load significantly (100ms debounce can cut requests by 50%+)
- Sharding: Redis Cluster handles it automatically since prefix is the key β no manual sharding logic needed
Related Resources
- ch04-rate-limiter - Rate limiting autocomplete API (prevent scraping)
- distributed-system-components > Key-Value Store - Storage for serialized trie
- distributed-system-components > Cache - Redis for trie cache layer
- ch06-key-value-store - Building the storage layer
- key-patterns > Read-Heavy Systems - Caching patterns for high-read workloads
Practice this design! Common Medium-Hard interview question. Be ready to:
- Draw the trie and explain naive vs optimized with complexity
- Explain the two-service split (gathering vs query)
- Walk through how weekly batch update pipeline works
- Discuss serializing trie to key-value store (the key insight)
- Handle edge cases: no results, offensive content, personalization
Last Updated: 2026-04-13
Status: Interview ready - Know the trie optimization cold!