Chapter 13 Flashcards - Search Autocomplete System

flashcards volume1 autocomplete trie search typeahead

What is a trie and how does it enable autocomplete?
?
A trie (prefix tree) is a tree where each node represents a character and paths from root represent prefixes. Root → ‘t’ → ‘w’ → ‘i’ → … represents the prefix “twi”. Every query starting with “twi” shares the same path prefix. To autocomplete “twi”: navigate to the “twi” node, collect all descendant words. Perfect for prefix matching because all queries sharing a prefix share nodes — no need to scan entire dataset.

What is the naive trie approach for finding top-k suggestions and what is its time complexity?
?
Naive approach: 1. Navigate to prefix node: O(p) where p = prefix length. 2. DFS over all descendant nodes to collect matching queries: O(n) where n = number of descendant nodes. 3. Sort by frequency: O(n log n). Total: O(p + n log n). Problem: When prefix “a” has millions of descendants, n is huge and this is too slow for real-time query serving (must be < 100ms).

What is the optimized trie approach and how does it achieve O(p) query time?
?
Store the top-k results at every node during trie construction (offline build phase). Each node caches: top_queries = [(query, freq), ...] for the top-k most frequent queries that share this prefix. At query time: navigate to prefix node in O(p), return cached top_queries in O(1). Total: O(p) ≈ O(1) in practice (prefix length ≤ 20 chars). Trade-off: uses more memory (k results stored at every node) but query is near-instant.

What is the space vs speed trade-off of the optimized trie?
?
Naive trie: O(n) memory (only store the tree), O(p + n log n) query time. Optimized trie: O(n × k) memory (store top-k at each of n nodes), O(p) query time. Memory cost: 1M nodes × 5 results × 50 bytes = 250 MB — very acceptable. Query speedup: from seconds (DFS + sort) to microseconds (array lookup). Always use optimized trie in production — the memory cost is justified.

What are the two services in a search autocomplete system and why are they separate?
?
Data Gathering Service: Logs all search queries, aggregates counts weekly, builds trie, writes to storage. Batch/offline, write-heavy, can run slowly (hours for rebuild). Query Service: Serves top-k suggestions for any prefix in < 100ms. Online/real-time, read-heavy, must be always available. They’re separate because their update cadence, scaling needs, and latency requirements are completely different. Mixing them would make both worse.

Why update autocomplete suggestions weekly instead of in real-time?
?

  1. Trie rebuild is computationally expensive — takes hours on large datasets. 2. Updating a single query requires recomputing top-k at ALL ancestor nodes (O(p) node updates, each with re-sort). 3. At 100M queries/day, real-time updates would be a constant expensive operation. 4. Search trends don’t meaningfully change within hours — weekly is fresh enough. 5. Simpler architecture: full weekly rebuild vs complex incremental update logic. For breaking news trends, a lightweight “trending” overlay can supplement the weekly trie.

How is the trie serialized to a key-value store for distributed serving?
?
Each prefix string becomes a key, its top-k suggestions become the value. Examples: “tw” → [(“twitter”,9000), (“twitch”,8500), …], “twi” → [(“twitter”,9000), …]. Serialized trie stored in HBase/DynamoDB (persistent) and Redis (cache). Query service just does redis.get(prefix) — O(1) lookup, no need for in-memory trie structure. This allows any query server to serve suggestions without loading the full 100 GB trie into memory.

What is the data gathering pipeline for autocomplete, step by step?
?

  1. Log Collection: Every search query logged to Kafka with query text and timestamp. 2. Weekly Batch Aggregation (Spark): Group by query, count occurrences, deduplicate by user. 3. Frequency Decay: new_freq = (1-decay) × old_freq + decay × weekly_count — prevents old queries from dominating forever. 4. Filtering: Remove offensive content (blocklist + ML classifier), spam, PII. 5. Trie Builder: Build optimized trie from top-N queries, serialize to KV store. 6. Cache Refresh: Load new trie data into Redis, expire old cache.

What does the frequency decay formula do and why is it important?
?
Formula: new_frequency = (1 - decay_factor) × old_frequency + decay_factor × weekly_count. Decay factor typically 0.1-0.5. Purpose: Prevents permanently popular old queries from never being displaced. Without decay, a query that was hugely popular 2 years ago would forever outrank newer trending queries. With decay, frequency of old queries gradually decreases each week unless they continue being searched. Tunes how responsive suggestions are to trend changes — higher decay = faster trend response.

How does the query service handle a cache miss?
?

  1. Query server checks Redis Trie Cache with prefix as key. 2. Cache HIT → Return top-k results immediately (< 1ms). 3. Cache MISS → Query Trie DB (HBase/DynamoDB) with same key. 4. Return results to client AND store in Redis cache for future requests. 5. Set appropriate TTL on cached result (e.g., 24 hours, refreshed weekly). Cache miss rate should be very low because the trie is pre-loaded into Redis after each weekly rebuild.

How do client-side optimizations reduce autocomplete server load?
?

  1. Debouncing: Only send request after 100ms of no typing (user pausing). Cuts requests by ~50% since users often type multiple characters quickly. 2. Browser cache: Store prefix → results in memory. “tw” and “twi” both hit server, but “tw” again (backspace) serves from cache. 3. Minimum prefix length: Don’t query for 1-character prefixes (too many results, too broad). Start at 2 characters. 4. Request cancellation: Cancel in-flight request when user types more characters. Prevents out-of-order responses.

How do you shard a trie for scale?
?
Option 1 - By first character: a-h on Shard 1, i-p on Shard 2, q-z on Shard 3. Simple but uneven (popular letters like ‘s’ create hot shards). Option 2 - By prefix range (weighted splits): Analyze query distribution, split so each shard handles equal query volume. More even but complex. Option 3 - Redis Cluster (recommended): Prefix string is the Redis key, Redis Cluster automatically distributes keys across nodes using consistent hashing. No manual sharding logic needed — let Redis handle it.

How do you filter offensive content from autocomplete suggestions?
?
Multi-layer approach: 1. Blocklist filter: Maintain list of known offensive/profane terms. Remove any query containing blocked terms during weekly batch build. Fast, handles known bad content. 2. ML classifier: Trained on labeled data to detect novel offensive patterns not in blocklist. Run on new queries in data pipeline. 3. Real-time filter: Regex check on suggestions before returning to client (final safety net). 4. User reporting: Allow users to flag inappropriate suggestions, use as training data for ML model. Defense in depth.

How does personalization work as an extension to autocomplete?
?
Two-tier merge at query time: Global layer: weekly trie with population-level popular queries. Personal layer: Per-user Redis sorted set storing user’s recent queries (last 30 days) with search counts. Query serving: 1. Get user’s top matching personal queries for prefix (from Redis). 2. Get global top-5 for prefix (from trie). 3. Merge: show up to 2 personal results + fill remaining with global. Cold users (no history) fall back to 100% global. Personal layer is small per-user, fast to query.

What is the time complexity of building the optimized trie?
?
Phase 1 - Insert all queries: O(M × L) where M = number of queries and L = average query length. Each query traverses/creates at most L nodes. Phase 2 - Compute top-k at each node (bottom-up DFS): O(N × k log k) where N = total nodes and k = number of suggestions to cache. At each node, merge child lists and sort. Total build time: O(M × L + N × k log k). This runs offline (weekly), so hours of compute time is acceptable.

How does multi-language support work in autocomplete?
?
Maintain separate trie per language: english_trie, spanish_trie, chinese_trie, etc. Route query to correct trie based on: 1. User’s browser/app language setting (Accept-Language header). 2. Detected input language (character set analysis). 3. User’s profile language preference. Chinese/Japanese/Korean use character-based tries (not romanized). Build each language trie independently in the weekly batch pipeline. Cache all language tries in Redis with language prefix in key: en:tw, es:tw.

What are the key differences between naive trie and optimized trie in terms of when top-k is computed?
?
Naive trie: Top-k computed at query time (online, real-time). For each user request: navigate to prefix, DFS all descendants, sort by frequency, return top-k. CPU-intensive at serve time. Bad for high QPS. Optimized trie: Top-k computed at build time (offline, weekly batch). Precomputed during trie construction and stored at each node. At query time: just read the precomputed list. All computation moved offline. This is the “pay once during build, benefit on every query” optimization.

What data does each node in an optimized trie store?
?
Each TrieNode stores: 1. children: dict mapping character → child TrieNode. 2. is_word: boolean, true if a valid query ends at this node. 3. frequency: int, search count for this exact query (0 if not a word). 4. top_queries: list of top-k (query_string, frequency) tuples for this prefix. The top_queries field is the key optimization — it caches the answer for “what are the top suggestions for this prefix?” so query time is just navigation + list read.

What is the complete query flow from user typing to suggestions appearing?
?

  1. User types character → browser waits 100ms (debounce). 2. Browser checks local cache for this prefix → HIT returns instantly. 3. MISS → HTTP GET /autocomplete?prefix=tw sent to server. 4. Load balancer routes to query server. 5. Query server checks Redis: redis.get("tw"). 6. Redis HIT → return JSON array of top-5 suggestions. 7. Redis MISS → query HBase/DynamoDB → return results + cache in Redis. 8. Response reaches client in < 100ms. 9. Browser caches result locally.

How does the autocomplete system handle the “no results” case?
?
If navigating the trie reaches a node with no top_queries (or the prefix doesn’t exist in trie): return empty array []. Don’t return generic popular queries for unmatched prefixes — it looks odd to see “Twitter” when user typed “xq”. The trie only contains top-N queries (say top 5 billion); very rare prefixes legitimately have no suggestions. Client-side: hide the suggestion dropdown when response is empty. Note: a prefix being absent from the trie does NOT mean the word doesn’t exist — only that no popular query starts with it.

What storage systems are used in autocomplete and why?
?
Kafka: Ingest search logs in real-time, buffer for weekly batch processing. High-throughput append-only log. Trie DB (HBase/DynamoDB): Persistent serialized trie. Survives Redis restarts. KV structure: prefix → top-k. Redis: Trie cache for low-latency serving. In-memory, O(1) lookup by prefix key. Relational DB (MySQL): Store filtered query stats, blocklist of offensive terms, pipeline metadata. Object Storage (S3): Raw search logs archive before Spark processing.

What is the QPS and scale math for a 10M DAU autocomplete system?
?
DAU: 10M. Searches per user per day: 10. Total searches/day: 100M. Characters per search: ~20 → autocomplete requests per search: ~20. Total autocomplete QPS: 100M × 20 / 86400 ≈ 23K QPS (avg), ~50K peak. Each query server handles ~1K-5K QPS (depends on hardware). Need 10-50 query servers. Redis can handle 100K+ ops/sec per instance → 1 Redis cluster sufficient. Storage: 100M searches/day × ~200 bytes = 20 GB/day of raw logs → Spark aggregates to ~1 GB of trie data weekly.

How does the system prevent a single query server from becoming a hot spot?
?

  1. Redis Cluster: Prefix keys distributed across Redis shards by Redis Cluster’s consistent hashing — no single Redis node handles all “tw*” queries. 2. Load balancer: Distributes requests evenly across query servers using round-robin or least-connections. 3. Client-side caching: Reduces repeated server hits for same prefixes — browser caches cut server QPS by 30-50%. 4. CDN for static suggestions: For very popular prefixes (single characters, common words), pre-generate and cache at CDN edge nodes. 5. Read replicas: Multiple Redis replicas for reads.

What happens when the weekly trie rebuild completes and needs to be swapped in?
?
Blue-green deployment for trie data: While old trie serves traffic, new trie is being built. When new trie is ready: 1. Write new trie data to Trie DB (HBase). 2. Load new data into a shadow Redis instance (not yet serving traffic). 3. Validate: run sample queries on shadow instance, compare results with old trie. 4. Atomic swap: update load balancer / service discovery to point query servers at new Redis. 5. Old Redis kept for 24h in case rollback needed. This ensures zero-downtime trie updates.

What are the 7 key takeaways from the autocomplete system design?
?

  1. Optimized trie (top-k at each node) = O(p) query time — the core insight. 2. Two services: data gathering (weekly batch) vs query serving (real-time). 3. Serialize trie to KV store so any server can query without loading full trie. 4. Redis as trie cache — prefix string is key, top-k is value, O(1) lookup. 5. Weekly rebuild is fine — trends don’t change hourly. 6. Client debounce + cache reduces server load by 50%+. 7. Redis Cluster handles sharding automatically via key-based distribution.

Total Cards: 25
Review Time: 20-25 minutes
Priority: HIGH - Common Medium-Hard interview question!
Last Updated: 2026-04-13