Jaro-Winkler Distance

String Algorithms

O(n×m) time, O(n+m) space

Advanced

String similarity metric giving more weight to common prefix. Used for record linkage.

Prerequisites:

String similarity concepts

Visualization

Interactive visualization for Jaro-Winkler Distance

String 1

String 2

Interactive visualization with step-by-step execution

Implementation

Language:

1function jaroWinkler(s1: string, s2: string): number {
2  const jaro = (a: string, b: string): number => {
3    if (a === b) return 1;
4    const matchDist = Math.floor(Math.max(a.length, b.length) / 2) - 1;
5    const aMatch = new Array(a.length).fill(false);
6    const bMatch = new Array(b.length).fill(false);
7    let matches = 0, trans = 0;
8    for (let i = 0; i < a.length; i++) {
9      const start = Math.max(0, i - matchDist);
10      const end = Math.min(i + matchDist + 1, b.length);
11      for (let j = start; j < end; j++) {
12        if (bMatch[j] || a[i] !== b[j]) continue;
13        aMatch[i] = bMatch[j] = true;
14        matches++;
15        break;
16      }
17    }
18    if (matches === 0) return 0;
19    let k = 0;
20    for (let i = 0; i < a.length; i++) {
21      if (!aMatch[i]) continue;
22      while (!bMatch[k]) k++;
23      if (a[i] !== b[k]) trans++;
24      k++;
25    }
26    return (matches / a.length + matches / b.length + (matches - trans / 2) / matches) / 3;
27  };
28  const j = jaro(s1, s2);
29  let prefix = 0;
30  for (let i = 0; i < Math.min(4, s1.length, s2.length); i++) {
31    if (s1[i] === s2[i]) prefix++; else break;
32  }
33  return j + prefix * 0.1 * (1 - j);
34}

Deep Dive

Theoretical Foundation

Extension of Jaro distance that boosts score if strings share a common prefix up to 4 chars. Jaro considers matching chars within certain distance and transpositions.

Complexity

Time

Best

O(n×m)

Average

O(n×m)

Worst

O(n×m)

Space

Required

O(n+m)

Applications

Industry Use

Record linkage

Duplicate detection

Name matching in databases

Use Cases

Record linkage

Name matching

Duplicate detection

Related Algorithms

Knuth-Morris-Pratt (KMP) Algorithm

An efficient string pattern matching algorithm that searches for occurrences of a 'word' within a 'text' by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin. Developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, it's one of the most important string algorithms with O(n+m) time complexity.

String Algorithms

Rabin-Karp Algorithm

A string-searching algorithm that uses hashing to find pattern(s) in a text. Developed by Michael O. Rabin and Richard M. Karp in 1987, it's particularly useful for multiple pattern search and plagiarism detection. Uses rolling hash for efficiency.

String Algorithms

Boyer-Moore Algorithm

One of the most efficient string searching algorithms in practice, using two heuristics: bad character rule and good suffix rule. Developed by Robert S. Boyer and J Strother Moore in 1977, it's the standard benchmark for practical string search, often outperforming other algorithms by skipping sections of text.

String Algorithms

Aho-Corasick Algorithm

A string-searching algorithm for locating elements of a finite set of strings (dictionary) within an input text. Invented by Alfred V. Aho and Margaret J. Corasick in 1975, it's a kind of dictionary-matching algorithm that simultaneously searches for all patterns in linear time, making it extremely efficient for multiple pattern matching.

String Algorithms