Rabin-Karp Algorithm

String Algorithms

O(n + m) average time, O(1) space

Intermediate

A string-searching algorithm that uses hashing to find pattern(s) in a text. Developed by Michael O. Rabin and Richard M. Karp in 1987, it's particularly useful for multiple pattern search and plagiarism detection. Uses rolling hash for efficiency.

Prerequisites:

Hashing

Modular arithmetic

Strings

Visualization

Interactive visualization for Rabin-Karp Algorithm

Rabin-Karp String Matching

Text:

Pattern:

Text:

• Time: O(n + m) average, O(nm) worst

• Uses rolling hash function

• Good for multiple pattern matching

Interactive visualization with step-by-step execution

Implementation

Language:

1function rabinKarp(text: string, pattern: string): number[] {
2  const result: number[] = [];
3  const d = 256; // number of characters in alphabet
4  const q = 101; // prime number for modulo
5  const m = pattern.length;
6  const n = text.length;
7  
8  let patternHash = 0;
9  let textHash = 0;
10  let h = 1;
11  
12  // Calculate h = d^(m-1) % q
13  for (let i = 0; i < m - 1; i++) {
14    h = (h * d) % q;
15  }
16  
17  // Calculate initial hash values
18  for (let i = 0; i < m; i++) {
19    patternHash = (d * patternHash + pattern.charCodeAt(i)) % q;
20    textHash = (d * textHash + text.charCodeAt(i)) % q;
21  }
22  
23  // Slide pattern over text
24  for (let i = 0; i <= n - m; i++) {
25    if (patternHash === textHash) {
26      // Check character by character
27      let match = true;
28      for (let j = 0; j < m; j++) {
29        if (text[i + j] !== pattern[j]) {
30          match = false;
31          break;
32        }
33      }
34      if (match) result.push(i);
35    }
36    
37    // Calculate hash for next window
38    if (i < n - m) {
39      textHash = (d * (textHash - text.charCodeAt(i) * h) + text.charCodeAt(i + m)) % q;
40      if (textHash < 0) textHash += q;
41    }
42  }
43  
44  return result;
45}

Deep Dive

Theoretical Foundation

Rabin-Karp uses a clever rolling hash function to achieve efficient pattern matching. The hash function treats strings as numbers in base d (alphabet size). For a string s of length m: hash = (s[0]*d^(m-1) + s[1]*d^(m-2) + ... + s[m-1]) mod q, where q is a large prime to avoid overflow. The beauty is in 'rolling': to compute hash for next window, we subtract contribution of outgoing character and add incoming character in O(1) time. Hash(s[i+1..i+m]) = (d * (Hash(s[i..i+m-1]) - s[i]*d^(m-1)) + s[i+m]) mod q. This allows checking all text windows in O(n) time. When hashes match (potential match), we verify with character-by-character comparison to handle hash collisions. Average case is O(n+m), worst case O(nm) with many collisions. The algorithm shines when searching for multiple patterns simultaneously - compute all pattern hashes once, then compare text hash against all.

Complexity

Time

Best

O(n + m)

Average

O(n + m)

Worst

O(nm)

Space

Required

O(1)

Applications

Industry Use

Plagiarism detection (Turnitin, Copyscape)

Multiple pattern search in text mining

DNA sequence searching (genomics research)

Document similarity detection

2D pattern matching in images (computer vision)

Network intrusion detection (multiple signature matching)

File comparison and diff tools

Spam filtering (multiple keyword detection)

Use Cases

Multiple pattern search

Plagiarism detection

Document comparison

Related Algorithms

Knuth-Morris-Pratt (KMP) Algorithm

An efficient string pattern matching algorithm that searches for occurrences of a 'word' within a 'text' by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin. Developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, it's one of the most important string algorithms with O(n+m) time complexity.

String Algorithms

Boyer-Moore Algorithm

One of the most efficient string searching algorithms in practice, using two heuristics: bad character rule and good suffix rule. Developed by Robert S. Boyer and J Strother Moore in 1977, it's the standard benchmark for practical string search, often outperforming other algorithms by skipping sections of text.

String Algorithms

Aho-Corasick Algorithm

A string-searching algorithm for locating elements of a finite set of strings (dictionary) within an input text. Invented by Alfred V. Aho and Margaret J. Corasick in 1975, it's a kind of dictionary-matching algorithm that simultaneously searches for all patterns in linear time, making it extremely efficient for multiple pattern matching.

String Algorithms

Manacher's Algorithm

An optimal algorithm for finding the longest palindromic substring in linear time. Invented by Glenn K. Manacher in 1975, it cleverly avoids redundant comparisons by utilizing previously computed palindrome information, achieving O(n) time complexity which is optimal for this problem.

String Algorithms