System Design: Build a Google-Scale Autocomplete With Prefix Matching, Typo Tolerance, and Personalization Ranking
System Design: Build a Google-Scale Autocomplete With Prefix Matching, Typo Tolerance, and Personalization Ranking
The Problem
Autocomplete is the most common MAANG frontend system design question. A basic trie gets you an L4 answer. An L6 answer implements weighted ranking, fuzzy matching for typos, phonetic search, personalization, prefix sharding, debounce + speculative prefetch, and a unified scoring algorithm. This article builds the complete system.
Architecture Overview
┌──────────────────────────────────────────────────────────────────┐
│ CLIENT │
│ │
│ ┌───────────┐ ┌──────────┐ ┌─────────────────────────┐ │
│ │ Input Box │───▶│ Debounce │───▶│ Speculative Prefetcher │ │
│ └───────────┘ │ (150ms) │ │ (predict next keystroke) │ │
│ └──────────┘ └────────────┬────────────┘ │
│ │ │
│ ┌───────────────────────────▼──────────────┐ │
│ │ Local Cache (LRU) │ │
│ │ "jav" → [{java, 0.95}, {javascript...}] │ │
│ └───────────────────────────┬──────────────┘ │
│ │ miss │
│ ───────────────────────────────────────────────┼─────────────── │
│ NETWORK ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ API Gateway / CDN Edge │ │
│ └───────────────────┬─────────────────────┘ │
│ │ │
│ ┌───────────────────▼─────────────────────┐ │
│ │ Autocomplete Service │ │
│ │ │ │
│ │ ┌────────────────────────────────────┐ │ │
│ │ │ 1. Prefix Match (Weighted Trie) │ │ │
│ │ │ 2. Fuzzy Match (BK-Tree) │ │ │
│ │ │ 3. Phonetic Match (Soundex) │ │ │
│ │ │ 4. Personalization (User Model) │ │ │
│ │ │ 5. Ranking (Blended Score) │ │ │
│ │ └────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Weighted Trie: Frequency-Based Ranking
A standard trie stores strings. A WEIGHTED trie also stores
the frequency (or search count) of each complete word,
and propagates the MAX frequency up the tree for fast top-k.
Example: "java" (freq 90), "javascript" (freq 85), "jazz" (freq 40)
root (max=90)
│
j (max=90)
│
a (max=90)
/ \
(max=90) (max=40)
v z
│ │
a(90) z(40) ← "jazz" complete, freq=40
│
(max=85)
s
│
c → r → i → p → t (85) ← "javascript" complete
QUERYING "ja":
1. Walk to node "j" → "a"
2. Collect all completions from here using DFS/BFS
3. Sort by frequency (or use a priority queue)
4. Return top-k
OPTIMIZATION: Store the top-k completions at each node,
precomputed during insert. Trade space for query speed.
Query becomes O(1) instead of O(subtree size).
interface TrieNode {
children: Map<string, TrieNode>;
isEnd: boolean;
frequency: number;
maxFrequency: number; // Max frequency in subtree (for pruning)
topCompletions: Array<{ word: string; score: number }>; // Precomputed top-k
}
class WeightedTrie {
private root: TrieNode;
private topK: number;
constructor(topK: number = 10) {
this.root = this.createNode();
this.topK = topK;
}
insert(word: string, frequency: number): void {
let node = this.root;
for (const char of word) {
if (!node.children.has(char)) {
node.children.set(char, this.createNode());
}
node = node.children.get(char)!;
node.maxFrequency = Math.max(node.maxFrequency, frequency);
}
node.isEnd = true;
node.frequency = Math.max(node.frequency, frequency);
// Update top-k completions along the path:
this.updateTopCompletions(word, frequency);
}
search(prefix: string, limit: number = 10): Array<{ word: string; score: number }> {
const node = this.findNode(prefix);
if (!node) return [];
// Fast path: use precomputed top-k if available
if (node.topCompletions.length > 0) {
return node.topCompletions.slice(0, limit);
}
// Slow path: collect with priority queue
return this.collectTopK(node, prefix, limit);
}
private findNode(prefix: string): TrieNode | null {
let node = this.root;
for (const char of prefix) {
if (!node.children.has(char)) return null;
node = node.children.get(char)!;
}
return node;
}
// Priority-queue-based collection with pruning:
private collectTopK(
startNode: TrieNode,
prefix: string,
limit: number
): Array<{ word: string; score: number }> {
const results: Array<{ word: string; score: number }> = [];
// Max-heap based on maxFrequency (for pruning):
const queue: Array<{ node: TrieNode; path: string }> = [
{ node: startNode, path: prefix },
];
let minScore = 0;
while (queue.length > 0) {
// Sort by maxFrequency descending (simulated priority queue):
queue.sort((a, b) => b.node.maxFrequency - a.node.maxFrequency);
const { node, path } = queue.shift()!;
// Prune: if this subtree's max can't beat our worst top-k, skip it
if (results.length >= limit && node.maxFrequency < minScore) {
continue;
}
if (node.isEnd) {
results.push({ word: path, score: node.frequency });
results.sort((a, b) => b.score - a.score);
if (results.length > limit) results.length = limit;
minScore = results.length >= limit ? results[results.length - 1].score : 0;
}
for (const [char, child] of node.children) {
if (results.length < limit || child.maxFrequency >= minScore) {
queue.push({ node: child, path: path + char });
}
}
}
return results;
}
private updateTopCompletions(word: string, frequency: number): void {
let node = this.root;
for (const char of word) {
node = node.children.get(char)!;
const existing = node.topCompletions.findIndex((c) => c.word === word);
if (existing !== -1) {
node.topCompletions[existing].score = Math.max(
node.topCompletions[existing].score,
frequency
);
} else {
node.topCompletions.push({ word, score: frequency });
}
node.topCompletions.sort((a, b) => b.score - a.score);
if (node.topCompletions.length > this.topK) {
node.topCompletions.length = this.topK;
}
}
}
private createNode(): TrieNode {
return {
children: new Map(),
isEnd: false,
frequency: 0,
maxFrequency: 0,
topCompletions: [],
};
}
}
BK-Tree: Edit-Distance Fuzzy Matching
A BK-tree organizes strings by EDIT DISTANCE for fast
fuzzy search. Finding all strings within edit distance d
of a query is much faster than checking every string.
STRUCTURE:
Each node has children at edges labeled with the edit distance
from child to parent.
Insert "book": root = "book"
Insert "books": dist("book","books") = 1 → child at edge 1
Insert "boo": dist("book","boo") = 1 → edge 1 taken!
→ go to "books", dist("books","boo") = 2
→ child of "books" at edge 2
Insert "cook": dist("book","cook") = 1 → edge 1 taken!
→ go to "books", dist("books","cook") = 2
→ edge 2 taken! → go to "boo"
→ dist("boo","cook") = 3 → child at edge 3
"book"
│
1: "books"
/
2: "boo"
│
3: "cook"
QUERY "bok" (max distance 1):
dist("book", "bok") = 1 ≤ 1 → MATCH ✓
Check children at edges [1-1, 1+1] = [0, 2]
Only edge 1 exists → check "books"
dist("books","bok") = 2 > 1 → no match
... (continue with triangle inequality pruning)
The TRIANGLE INEQUALITY prunes the search:
If dist(query, node) = d, children at edge e are only
relevant if |d - maxDist| ≤ e ≤ d + maxDist.
class BKTree {
private root: BKNode | null = null;
insert(word: string, data?: any): void {
if (!this.root) {
this.root = { word, data, children: new Map() };
return;
}
let current = this.root;
while (true) {
const dist = levenshteinDistance(current.word, word);
if (dist === 0) return; // Duplicate
if (!current.children.has(dist)) {
current.children.set(dist, { word, data, children: new Map() });
return;
}
current = current.children.get(dist)!;
}
}
search(query: string, maxDistance: number): Array<{ word: string; distance: number }> {
if (!this.root) return [];
const results: Array<{ word: string; distance: number }> = [];
const stack: BKNode[] = [this.root];
while (stack.length > 0) {
const node = stack.pop()!;
const dist = levenshteinDistance(query, node.word);
if (dist <= maxDistance) {
results.push({ word: node.word, distance: dist });
}
// Triangle inequality pruning:
const low = dist - maxDistance;
const high = dist + maxDistance;
for (const [edgeDist, child] of node.children) {
if (edgeDist >= low && edgeDist <= high) {
stack.push(child);
}
}
}
return results.sort((a, b) => a.distance - b.distance);
}
}
interface BKNode {
word: string;
data?: any;
children: Map<number, BKNode>;
}
function levenshteinDistance(a: string, b: string): number {
const m = a.length;
const n = b.length;
let prev = new Uint16Array(n + 1);
let curr = new Uint16Array(n + 1);
for (let j = 0; j <= n; j++) prev[j] = j;
for (let i = 1; i <= m; i++) {
curr[0] = i;
for (let j = 1; j <= n; j++) {
const cost = a[i - 1] === b[j - 1] ? 0 : 1;
curr[j] = Math.min(
prev[j] + 1, // deletion
curr[j - 1] + 1, // insertion
prev[j - 1] + cost // substitution
);
}
[prev, curr] = [curr, prev];
}
return prev[n];
}
Phonetic Matching: Soundex and Metaphone
function soundex(word: string): string {
if (!word) return "";
const upper = word.toUpperCase();
let code = upper[0];
const map: Record<string, string> = {
B: "1", F: "1", P: "1", V: "1",
C: "2", G: "2", J: "2", K: "2", Q: "2", S: "2", X: "2", Z: "2",
D: "3", T: "3",
L: "4",
M: "5", N: "5",
R: "6",
};
let lastCode = map[upper[0]] || "";
for (let i = 1; i < upper.length; i++) {
const charCode = map[upper[i]];
if (charCode && charCode !== lastCode) {
code += charCode;
if (code.length === 4) break;
}
lastCode = charCode || "";
}
return (code + "000").slice(0, 4);
}
// Build phonetic index:
class PhoneticIndex {
private index = new Map<string, Set<string>>();
add(word: string): void {
const code = soundex(word);
if (!this.index.has(code)) this.index.set(code, new Set());
this.index.get(code)!.add(word);
}
search(query: string): string[] {
const code = soundex(query);
return [...(this.index.get(code) || [])];
}
}
Personalization: LRU + Frequency Hybrid
class PersonalizationLayer {
private recentSearches: Map<string, number> = new Map();
private searchFrequency: Map<string, number> = new Map();
private maxRecent: number;
private decayFactor: number;
constructor(maxRecent: number = 100, decayFactor: number = 0.95) {
this.maxRecent = maxRecent;
this.decayFactor = decayFactor;
}
recordSearch(query: string): void {
const freq = this.searchFrequency.get(query) || 0;
this.searchFrequency.set(query, freq + 1);
this.recentSearches.delete(query);
this.recentSearches.set(query, Date.now());
if (this.recentSearches.size > this.maxRecent) {
const oldest = this.recentSearches.keys().next().value;
if (oldest !== undefined) this.recentSearches.delete(oldest);
}
}
getPersonalizationScore(candidate: string): number {
let score = 0;
// Recency boost:
const entries = [...this.recentSearches.entries()];
const recencyIndex = entries.findIndex(([q]) => q === candidate);
if (recencyIndex !== -1) {
const position = entries.length - recencyIndex;
score += Math.pow(this.decayFactor, entries.length - position) * 50;
}
// Frequency boost:
const freq = this.searchFrequency.get(candidate) || 0;
score += Math.log1p(freq) * 10;
return score;
}
}
Unified Ranking Algorithm
THE BLENDED SCORE:
══════════════════
score = 0.30 × globalPopularity
+ 0.15 × recencyBoost
+ 0.25 × personalizationScore
+ 0.15 × exactPrefixBonus
+ 0.10 × (1 / (1 + typoDistance))
+ 0.05 × phoneticMatchBonus
class AutocompleteRanker {
private trie: WeightedTrie;
private bkTree: BKTree;
private personalization: PersonalizationLayer;
private phoneticIndex: PhoneticIndex;
constructor(
trie: WeightedTrie,
bkTree: BKTree,
personalization: PersonalizationLayer,
phoneticIndex: PhoneticIndex
) {
this.trie = trie;
this.bkTree = bkTree;
this.personalization = personalization;
this.phoneticIndex = phoneticIndex;
}
rank(query: string, limit: number = 10): ScoredCandidate[] {
const candidates = new Map<string, { source: string; distance: number }>();
// Exact prefix matches:
for (const match of this.trie.search(query, limit * 2)) {
candidates.set(match.word, { source: "prefix", distance: 0 });
}
// Fuzzy matches:
if (query.length >= 3) {
const maxDist = query.length <= 4 ? 1 : 2;
for (const match of this.bkTree.search(query, maxDist)) {
if (!candidates.has(match.word)) {
candidates.set(match.word, { source: "fuzzy", distance: match.distance });
}
}
}
// Phonetic matches:
const queryPhonetic = soundex(query);
for (const word of this.phoneticIndex.search(query)) {
if (!candidates.has(word)) {
candidates.set(word, { source: "phonetic", distance: 2 });
}
}
// Score each candidate:
const scored: ScoredCandidate[] = [];
for (const [word, { distance }] of candidates) {
const trieResults = this.trie.search(word, 1);
const globalPop = trieResults.length > 0 ? trieResults[0].score / 100 : 0;
const personal = this.personalization.getPersonalizationScore(word);
const isExactPrefix = word.toLowerCase().startsWith(query.toLowerCase());
const phoneticScore = soundex(word) === queryPhonetic ? 1 : 0;
const score =
0.30 * globalPop +
0.25 * (personal / 100) +
0.15 * (isExactPrefix ? 1 : 0) +
0.10 * (1 / (1 + distance)) +
0.05 * phoneticScore;
scored.push({ word, score });
}
return scored.sort((a, b) => b.score - a.score).slice(0, limit);
}
}
interface ScoredCandidate {
word: string;
score: number;
}
Frontend: Debounce + Speculative Prefetch
class AutocompleteClient {
private cache = new LRUCache<string, ScoredCandidate[]>(200);
private debounceTimer: ReturnType<typeof setTimeout> | null = null;
private inflightRequest: AbortController | null = null;
async getSuggestions(query: string, debounceMs = 150): Promise<ScoredCandidate[]> {
const cached = this.cache.get(query);
if (cached) return cached;
this.inflightRequest?.abort();
return new Promise((resolve) => {
if (this.debounceTimer) clearTimeout(this.debounceTimer);
this.debounceTimer = setTimeout(async () => {
try {
const controller = new AbortController();
this.inflightRequest = controller;
const response = await fetch(
`/api/autocomplete?q=${encodeURIComponent(query)}`,
{ signal: controller.signal }
);
const results: ScoredCandidate[] = await response.json();
this.cache.set(query, results);
this.speculativePrefetch(query);
resolve(results);
} catch (e) {
if ((e as Error).name !== "AbortError") resolve([]);
}
}, debounceMs);
});
}
private speculativePrefetch(query: string): void {
const nextChars = "etaoinsrhld".split("");
const targets = nextChars
.map((c) => query + c)
.filter((q) => !this.cache.has(q))
.slice(0, 3);
if ("requestIdleCallback" in globalThis) {
requestIdleCallback(() => {
for (const target of targets) {
fetch(`/api/autocomplete?q=${encodeURIComponent(target)}`)
.then((r) => r.json())
.then((results) => this.cache.set(target, results))
.catch(() => {});
}
});
}
}
}
class LRUCache<K, V> {
private map = new Map<K, V>();
constructor(private capacity: number) {}
get(key: K): V | undefined {
if (!this.map.has(key)) return undefined;
const value = this.map.get(key)!;
this.map.delete(key);
this.map.set(key, value);
return value;
}
set(key: K, value: V): void {
if (this.map.has(key)) this.map.delete(key);
this.map.set(key, value);
if (this.map.size > this.capacity) {
const oldest = this.map.keys().next().value;
if (oldest !== undefined) this.map.delete(oldest);
}
}
has(key: K): boolean { return this.map.has(key); }
}
Prefix-Based Sharding
DISTRIBUTED AUTOCOMPLETE:
═════════════════════════
Shard A: a-f Shard B: g-m Shard C: n-s Shard D: t-z
Query "javascript" → route to Shard B (j ∈ g-m)
Hot key problem: "the", "to", "that" all go to Shard D.
Solution: Sub-shard hot prefixes (D1: ta-th, D2: ti-tz).
Each shard: 3 replicas. Read from any (eventual consistency fine for autocomplete).
Interview Q&A
Q: Why use a BK-tree for fuzzy matching instead of computing Levenshtein distance against every word?
A: Brute-force Levenshtein against N words is O(N × L²) per query. A BK-tree exploits the triangle inequality: if dist(query, node) = d and max distance is k, only children at edges [d-k, d+k] are checked. For 100K words with max distance 2, a BK-tree examines ~5-10% of nodes, dropping from O(N) to approximately O(N^(2/3)) comparisons.
Q: How do you handle the ranking tradeoff between global popularity and personalization?
A: Use adaptive weights: w_personal = min(0.4, 0.05 × search_history_count). New users get pure global ranking; users with 8+ searches get 40% personalization. Apply recency decay so old searches lose influence. This prevents cold-start problems while rewarding returning users with relevant suggestions.
Q: Why debounce on the client and not just on the server? A: Client debounce prevents sending 4 requests for "java" typed in 200ms. A 150ms debounce sends only "java". Server-side throttling is also needed for DDoS, but client debounce is the first defense. Combine with speculative prefetching: after "java" arrives, prefetch "javas"/"javac" during idle time so the next keystroke is instant from cache.
Q: How would you design autocomplete for 1 billion queries/day? A: (1) Shard trie by prefix across hundreds of servers. (2) Cache hot prefixes at CDN edge (top-1000 prefixes serve 80%+ of traffic). (3) Precompute top-10 for all 2-3 char prefixes, serve from memory. (4) Update popularity scores asynchronously via streaming pipeline. (5) Use read replicas — autocomplete tolerates seconds of staleness. Budget: 100 shards × 3 replicas = 300 servers, ~9 QPS per shard after CDN absorbs 80%.
What did you think?