System Design: Build a Google-Scale Autocomplete With Prefix Matching, Typo Tolerance, and Personalization Ranking

March 20, 20262 min read3 views

system design

autocomplete system

prefix matching

fuzzy search

personalization ranking

performance engineering

large scale applications

System Design: Build a Google-Scale Autocomplete With Prefix Matching, Typo Tolerance, and Personalization Ranking

The Problem

Autocomplete is the most common MAANG frontend system design question. A basic trie gets you an L4 answer. An L6 answer implements weighted ranking, fuzzy matching for typos, phonetic search, personalization, prefix sharding, debounce + speculative prefetch, and a unified scoring algorithm. This article builds the complete system.

Architecture Overview

  ┌──────────────────────────────────────────────────────────────────┐
  │ CLIENT                                                           │
  │                                                                  │
  │  ┌───────────┐    ┌──────────┐    ┌─────────────────────────┐   │
  │  │ Input Box │───▶│ Debounce │───▶│ Speculative Prefetcher  │   │
  │  └───────────┘    │ (150ms)  │    │ (predict next keystroke) │   │
  │                   └──────────┘    └────────────┬────────────┘   │
  │                                                │                 │
  │                    ┌───────────────────────────▼──────────────┐  │
  │                    │         Local Cache (LRU)                │  │
  │                    │  "jav" → [{java, 0.95}, {javascript...}] │  │
  │                    └───────────────────────────┬──────────────┘  │
  │                                                │ miss            │
  │ ───────────────────────────────────────────────┼─────────────── │
  │ NETWORK                                        ▼                 │
  │                    ┌─────────────────────────────────────────┐   │
  │                    │        API Gateway / CDN Edge            │   │
  │                    └───────────────────┬─────────────────────┘   │
  │                                        │                         │
  │                    ┌───────────────────▼─────────────────────┐   │
  │                    │         Autocomplete Service             │   │
  │                    │                                          │   │
  │                    │  ┌────────────────────────────────────┐  │   │
  │                    │  │ 1. Prefix Match (Weighted Trie)    │  │   │
  │                    │  │ 2. Fuzzy Match (BK-Tree)           │  │   │
  │                    │  │ 3. Phonetic Match (Soundex)        │  │   │
  │                    │  │ 4. Personalization (User Model)    │  │   │
  │                    │  │ 5. Ranking (Blended Score)         │  │   │
  │                    │  └────────────────────────────────────┘  │   │
  │                    └─────────────────────────────────────────┘   │
  └──────────────────────────────────────────────────────────────────┘

Weighted Trie: Frequency-Based Ranking

  A standard trie stores strings. A WEIGHTED trie also stores
  the frequency (or search count) of each complete word,
  and propagates the MAX frequency up the tree for fast top-k.

  Example: "java" (freq 90), "javascript" (freq 85), "jazz" (freq 40)

            root (max=90)
              │
              j (max=90)
              │
              a (max=90)
             / \
        (max=90) (max=40)
           v        z
           │        │
           a(90)    z(40)  ← "jazz" complete, freq=40
           │
      (max=85)
        s
        │
        c → r → i → p → t (85)  ← "javascript" complete

  QUERYING "ja":
  1. Walk to node "j" → "a"
  2. Collect all completions from here using DFS/BFS
  3. Sort by frequency (or use a priority queue)
  4. Return top-k

  OPTIMIZATION: Store the top-k completions at each node,
  precomputed during insert. Trade space for query speed.
  Query becomes O(1) instead of O(subtree size).

interface TrieNode {
  children: Map<string, TrieNode>;
  isEnd: boolean;
  frequency: number;
  maxFrequency: number; // Max frequency in subtree (for pruning)
  topCompletions: Array<{ word: string; score: number }>; // Precomputed top-k
}

class WeightedTrie {
  private root: TrieNode;
  private topK: number;

  constructor(topK: number = 10) {
    this.root = this.createNode();
    this.topK = topK;
  }

  insert(word: string, frequency: number): void {
    let node = this.root;

    for (const char of word) {
      if (!node.children.has(char)) {
        node.children.set(char, this.createNode());
      }
      node = node.children.get(char)!;
      node.maxFrequency = Math.max(node.maxFrequency, frequency);
    }

    node.isEnd = true;
    node.frequency = Math.max(node.frequency, frequency);

    // Update top-k completions along the path:
    this.updateTopCompletions(word, frequency);
  }

  search(prefix: string, limit: number = 10): Array<{ word: string; score: number }> {
    const node = this.findNode(prefix);
    if (!node) return [];

    // Fast path: use precomputed top-k if available
    if (node.topCompletions.length > 0) {
      return node.topCompletions.slice(0, limit);
    }

    // Slow path: collect with priority queue
    return this.collectTopK(node, prefix, limit);
  }

  private findNode(prefix: string): TrieNode | null {
    let node = this.root;
    for (const char of prefix) {
      if (!node.children.has(char)) return null;
      node = node.children.get(char)!;
    }
    return node;
  }

  // Priority-queue-based collection with pruning:
  private collectTopK(
    startNode: TrieNode,
    prefix: string,
    limit: number
  ): Array<{ word: string; score: number }> {
    const results: Array<{ word: string; score: number }> = [];

    // Max-heap based on maxFrequency (for pruning):
    const queue: Array<{ node: TrieNode; path: string }> = [
      { node: startNode, path: prefix },
    ];

    let minScore = 0;

    while (queue.length > 0) {
      // Sort by maxFrequency descending (simulated priority queue):
      queue.sort((a, b) => b.node.maxFrequency - a.node.maxFrequency);
      const { node, path } = queue.shift()!;

      // Prune: if this subtree's max can't beat our worst top-k, skip it
      if (results.length >= limit && node.maxFrequency < minScore) {
        continue;
      }

      if (node.isEnd) {
        results.push({ word: path, score: node.frequency });
        results.sort((a, b) => b.score - a.score);
        if (results.length > limit) results.length = limit;
        minScore = results.length >= limit ? results[results.length - 1].score : 0;
      }

      for (const [char, child] of node.children) {
        if (results.length < limit || child.maxFrequency >= minScore) {
          queue.push({ node: child, path: path + char });
        }
      }
    }

    return results;
  }

  private updateTopCompletions(word: string, frequency: number): void {
    let node = this.root;
    for (const char of word) {
      node = node.children.get(char)!;
      const existing = node.topCompletions.findIndex((c) => c.word === word);
      if (existing !== -1) {
        node.topCompletions[existing].score = Math.max(
          node.topCompletions[existing].score,
          frequency
        );
      } else {
        node.topCompletions.push({ word, score: frequency });
      }
      node.topCompletions.sort((a, b) => b.score - a.score);
      if (node.topCompletions.length > this.topK) {
        node.topCompletions.length = this.topK;
      }
    }
  }

  private createNode(): TrieNode {
    return {
      children: new Map(),
      isEnd: false,
      frequency: 0,
      maxFrequency: 0,
      topCompletions: [],
    };
  }
}

BK-Tree: Edit-Distance Fuzzy Matching

  A BK-tree organizes strings by EDIT DISTANCE for fast
  fuzzy search. Finding all strings within edit distance d
  of a query is much faster than checking every string.

  STRUCTURE:
  Each node has children at edges labeled with the edit distance
  from child to parent.

  Insert "book":  root = "book"
  Insert "books": dist("book","books") = 1 → child at edge 1
  Insert "boo":   dist("book","boo")   = 1 → edge 1 taken!
                   → go to "books", dist("books","boo") = 2
                   → child of "books" at edge 2
  Insert "cook":  dist("book","cook")  = 1 → edge 1 taken!
                  → go to "books", dist("books","cook") = 2
                  → edge 2 taken! → go to "boo"
                  → dist("boo","cook") = 3 → child at edge 3

                     "book"
                       │
                    1: "books"
                      /
                   2: "boo"
                      │
                   3: "cook"

  QUERY "bok" (max distance 1):
  dist("book", "bok") = 1 ≤ 1 → MATCH ✓
  Check children at edges [1-1, 1+1] = [0, 2]
  Only edge 1 exists → check "books"
  dist("books","bok") = 2 > 1 → no match
  ... (continue with triangle inequality pruning)

  The TRIANGLE INEQUALITY prunes the search:
  If dist(query, node) = d, children at edge e are only
  relevant if |d - maxDist| ≤ e ≤ d + maxDist.

class BKTree {
  private root: BKNode | null = null;

  insert(word: string, data?: any): void {
    if (!this.root) {
      this.root = { word, data, children: new Map() };
      return;
    }

    let current = this.root;
    while (true) {
      const dist = levenshteinDistance(current.word, word);
      if (dist === 0) return; // Duplicate

      if (!current.children.has(dist)) {
        current.children.set(dist, { word, data, children: new Map() });
        return;
      }

      current = current.children.get(dist)!;
    }
  }

  search(query: string, maxDistance: number): Array<{ word: string; distance: number }> {
    if (!this.root) return [];

    const results: Array<{ word: string; distance: number }> = [];
    const stack: BKNode[] = [this.root];

    while (stack.length > 0) {
      const node = stack.pop()!;
      const dist = levenshteinDistance(query, node.word);

      if (dist <= maxDistance) {
        results.push({ word: node.word, distance: dist });
      }

      // Triangle inequality pruning:
      const low = dist - maxDistance;
      const high = dist + maxDistance;

      for (const [edgeDist, child] of node.children) {
        if (edgeDist >= low && edgeDist <= high) {
          stack.push(child);
        }
      }
    }

    return results.sort((a, b) => a.distance - b.distance);
  }
}

interface BKNode {
  word: string;
  data?: any;
  children: Map<number, BKNode>;
}

function levenshteinDistance(a: string, b: string): number {
  const m = a.length;
  const n = b.length;

  let prev = new Uint16Array(n + 1);
  let curr = new Uint16Array(n + 1);

  for (let j = 0; j <= n; j++) prev[j] = j;

  for (let i = 1; i <= m; i++) {
    curr[0] = i;
    for (let j = 1; j <= n; j++) {
      const cost = a[i - 1] === b[j - 1] ? 0 : 1;
      curr[j] = Math.min(
        prev[j] + 1,      // deletion
        curr[j - 1] + 1,  // insertion
        prev[j - 1] + cost // substitution
      );
    }
    [prev, curr] = [curr, prev];
  }

  return prev[n];
}

Phonetic Matching: Soundex and Metaphone

function soundex(word: string): string {
  if (!word) return "";

  const upper = word.toUpperCase();
  let code = upper[0];

  const map: Record<string, string> = {
    B: "1", F: "1", P: "1", V: "1",
    C: "2", G: "2", J: "2", K: "2", Q: "2", S: "2", X: "2", Z: "2",
    D: "3", T: "3",
    L: "4",
    M: "5", N: "5",
    R: "6",
  };

  let lastCode = map[upper[0]] || "";

  for (let i = 1; i < upper.length; i++) {
    const charCode = map[upper[i]];
    if (charCode && charCode !== lastCode) {
      code += charCode;
      if (code.length === 4) break;
    }
    lastCode = charCode || "";
  }

  return (code + "000").slice(0, 4);
}

// Build phonetic index:
class PhoneticIndex {
  private index = new Map<string, Set<string>>();

  add(word: string): void {
    const code = soundex(word);
    if (!this.index.has(code)) this.index.set(code, new Set());
    this.index.get(code)!.add(word);
  }

  search(query: string): string[] {
    const code = soundex(query);
    return [...(this.index.get(code) || [])];
  }
}

Personalization: LRU + Frequency Hybrid

class PersonalizationLayer {
  private recentSearches: Map<string, number> = new Map();
  private searchFrequency: Map<string, number> = new Map();
  private maxRecent: number;
  private decayFactor: number;

  constructor(maxRecent: number = 100, decayFactor: number = 0.95) {
    this.maxRecent = maxRecent;
    this.decayFactor = decayFactor;
  }

  recordSearch(query: string): void {
    const freq = this.searchFrequency.get(query) || 0;
    this.searchFrequency.set(query, freq + 1);

    this.recentSearches.delete(query);
    this.recentSearches.set(query, Date.now());

    if (this.recentSearches.size > this.maxRecent) {
      const oldest = this.recentSearches.keys().next().value;
      if (oldest !== undefined) this.recentSearches.delete(oldest);
    }
  }

  getPersonalizationScore(candidate: string): number {
    let score = 0;

    // Recency boost:
    const entries = [...this.recentSearches.entries()];
    const recencyIndex = entries.findIndex(([q]) => q === candidate);
    if (recencyIndex !== -1) {
      const position = entries.length - recencyIndex;
      score += Math.pow(this.decayFactor, entries.length - position) * 50;
    }

    // Frequency boost:
    const freq = this.searchFrequency.get(candidate) || 0;
    score += Math.log1p(freq) * 10;

    return score;
  }
}

Unified Ranking Algorithm

  THE BLENDED SCORE:
  ══════════════════

  score = 0.30 × globalPopularity
        + 0.15 × recencyBoost
        + 0.25 × personalizationScore
        + 0.15 × exactPrefixBonus
        + 0.10 × (1 / (1 + typoDistance))
        + 0.05 × phoneticMatchBonus

class AutocompleteRanker {
  private trie: WeightedTrie;
  private bkTree: BKTree;
  private personalization: PersonalizationLayer;
  private phoneticIndex: PhoneticIndex;

  constructor(
    trie: WeightedTrie,
    bkTree: BKTree,
    personalization: PersonalizationLayer,
    phoneticIndex: PhoneticIndex
  ) {
    this.trie = trie;
    this.bkTree = bkTree;
    this.personalization = personalization;
    this.phoneticIndex = phoneticIndex;
  }

  rank(query: string, limit: number = 10): ScoredCandidate[] {
    const candidates = new Map<string, { source: string; distance: number }>();

    // Exact prefix matches:
    for (const match of this.trie.search(query, limit * 2)) {
      candidates.set(match.word, { source: "prefix", distance: 0 });
    }

    // Fuzzy matches:
    if (query.length >= 3) {
      const maxDist = query.length <= 4 ? 1 : 2;
      for (const match of this.bkTree.search(query, maxDist)) {
        if (!candidates.has(match.word)) {
          candidates.set(match.word, { source: "fuzzy", distance: match.distance });
        }
      }
    }

    // Phonetic matches:
    const queryPhonetic = soundex(query);
    for (const word of this.phoneticIndex.search(query)) {
      if (!candidates.has(word)) {
        candidates.set(word, { source: "phonetic", distance: 2 });
      }
    }

    // Score each candidate:
    const scored: ScoredCandidate[] = [];
    for (const [word, { distance }] of candidates) {
      const trieResults = this.trie.search(word, 1);
      const globalPop = trieResults.length > 0 ? trieResults[0].score / 100 : 0;
      const personal = this.personalization.getPersonalizationScore(word);
      const isExactPrefix = word.toLowerCase().startsWith(query.toLowerCase());
      const phoneticScore = soundex(word) === queryPhonetic ? 1 : 0;

      const score =
        0.30 * globalPop +
        0.25 * (personal / 100) +
        0.15 * (isExactPrefix ? 1 : 0) +
        0.10 * (1 / (1 + distance)) +
        0.05 * phoneticScore;

      scored.push({ word, score });
    }

    return scored.sort((a, b) => b.score - a.score).slice(0, limit);
  }
}

interface ScoredCandidate {
  word: string;
  score: number;
}

Frontend: Debounce + Speculative Prefetch

class AutocompleteClient {
  private cache = new LRUCache<string, ScoredCandidate[]>(200);
  private debounceTimer: ReturnType<typeof setTimeout> | null = null;
  private inflightRequest: AbortController | null = null;

  async getSuggestions(query: string, debounceMs = 150): Promise<ScoredCandidate[]> {
    const cached = this.cache.get(query);
    if (cached) return cached;

    this.inflightRequest?.abort();

    return new Promise((resolve) => {
      if (this.debounceTimer) clearTimeout(this.debounceTimer);

      this.debounceTimer = setTimeout(async () => {
        try {
          const controller = new AbortController();
          this.inflightRequest = controller;

          const response = await fetch(
            `/api/autocomplete?q=${encodeURIComponent(query)}`,
            { signal: controller.signal }
          );
          const results: ScoredCandidate[] = await response.json();
          this.cache.set(query, results);
          this.speculativePrefetch(query);
          resolve(results);
        } catch (e) {
          if ((e as Error).name !== "AbortError") resolve([]);
        }
      }, debounceMs);
    });
  }

  private speculativePrefetch(query: string): void {
    const nextChars = "etaoinsrhld".split("");
    const targets = nextChars
      .map((c) => query + c)
      .filter((q) => !this.cache.has(q))
      .slice(0, 3);

    if ("requestIdleCallback" in globalThis) {
      requestIdleCallback(() => {
        for (const target of targets) {
          fetch(`/api/autocomplete?q=${encodeURIComponent(target)}`)
            .then((r) => r.json())
            .then((results) => this.cache.set(target, results))
            .catch(() => {});
        }
      });
    }
  }
}

class LRUCache<K, V> {
  private map = new Map<K, V>();
  constructor(private capacity: number) {}

  get(key: K): V | undefined {
    if (!this.map.has(key)) return undefined;
    const value = this.map.get(key)!;
    this.map.delete(key);
    this.map.set(key, value);
    return value;
  }

  set(key: K, value: V): void {
    if (this.map.has(key)) this.map.delete(key);
    this.map.set(key, value);
    if (this.map.size > this.capacity) {
      const oldest = this.map.keys().next().value;
      if (oldest !== undefined) this.map.delete(oldest);
    }
  }

  has(key: K): boolean { return this.map.has(key); }
}

Prefix-Based Sharding

  DISTRIBUTED AUTOCOMPLETE:
  ═════════════════════════

  Shard A: a-f     Shard B: g-m     Shard C: n-s     Shard D: t-z

  Query "javascript" → route to Shard B (j ∈ g-m)

  Hot key problem: "the", "to", "that" all go to Shard D.
  Solution: Sub-shard hot prefixes (D1: ta-th, D2: ti-tz).

  Each shard: 3 replicas. Read from any (eventual consistency fine for autocomplete).

Interview Q&A

Q: Why use a BK-tree for fuzzy matching instead of computing Levenshtein distance against every word? A: Brute-force Levenshtein against N words is O(N × L²) per query. A BK-tree exploits the triangle inequality: if dist(query, node) = d and max distance is k, only children at edges [d-k, d+k] are checked. For 100K words with max distance 2, a BK-tree examines ~5-10% of nodes, dropping from O(N) to approximately O(N^(2/3)) comparisons.

Q: How do you handle the ranking tradeoff between global popularity and personalization? A: Use adaptive weights: w_personal = min(0.4, 0.05 × search_history_count). New users get pure global ranking; users with 8+ searches get 40% personalization. Apply recency decay so old searches lose influence. This prevents cold-start problems while rewarding returning users with relevant suggestions.

Q: Why debounce on the client and not just on the server? A: Client debounce prevents sending 4 requests for "java" typed in 200ms. A 150ms debounce sends only "java". Server-side throttling is also needed for DDoS, but client debounce is the first defense. Combine with speculative prefetching: after "java" arrives, prefetch "javas"/"javac" during idle time so the next keystroke is instant from cache.

Q: How would you design autocomplete for 1 billion queries/day? A: (1) Shard trie by prefix across hundreds of servers. (2) Cache hot prefixes at CDN edge (top-1000 prefixes serve 80%+ of traffic). (3) Precompute top-10 for all 2-3 char prefixes, serve from memory. (4) Update popularity scores asynchronously via streaming pipeline. (5) Use read replicas — autocomplete tolerates seconds of staleness. Budget: 100 shards × 3 replicas = 300 servers, ~9 QPS per shard after CDN absorbs 80%.

What did you think?