Rolling Hash (Rabin-Karp Hash)

String Algorithms

O(1) per roll time, O(1) space

Intermediate

Efficiently compute hash values for sliding windows in O(1) time per update, enabling fast substring search. The rolling hash recalculates the hash incrementally by removing the leftmost character and adding the rightmost character, avoiding complete recalculation. Core technique in Rabin-Karp string matching, plagiarism detection, and any problem requiring fast substring comparison. Used in Rabin fingerprinting and data deduplication.

Prerequisites:

Hashing

Modular arithmetic

Polynomial evaluation

Visualization

Interactive visualization for Rolling Hash (Rabin-Karp Hash)

Text

Pattern

Interactive visualization with step-by-step execution

Implementation

Language:

1class RollingHash {
2  private base: number = 256;
3  private mod: number = 1e9 + 7;
4  
5  computeHash(s: string, start: number, len: number): number {
6    let hash = 0;
7    for (let i = 0; i < len; i++) {
8      hash = (hash * this.base + s.charCodeAt(start + i)) % this.mod;
9    }
10    return hash;
11  }
12  
13  roll(hash: number, oldChar: string, newChar: string, power: number): number {
14    hash = (hash - oldChar.charCodeAt(0) * power % this.mod + this.mod) % this.mod;
15    hash = (hash * this.base + newChar.charCodeAt(0)) % this.mod;
16    return hash;
17  }
18  
19  search(text: string, pattern: string): number[] {
20    const n = text.length, m = pattern.length;
21    if (m > n) return [];
22    
23    const matches: number[] = [];
24    let power = 1;
25    for (let i = 0; i < m - 1; i++) {
26      power = (power * this.base) % this.mod;
27    }
28    
29    const patternHash = this.computeHash(pattern, 0, m);
30    let textHash = this.computeHash(text, 0, m);
31    
32    if (textHash === patternHash && text.substring(0, m) === pattern) {
33      matches.push(0);
34    }
35    
36    for (let i = 1; i <= n - m; i++) {
37      textHash = this.roll(textHash, text[i - 1], text[i + m - 1], power);
38      if (textHash === patternHash && text.substring(i, i + m) === pattern) {
39        matches.push(i);
40      }
41    }
42    
43    return matches;
44  }
45}

Deep Dive

Theoretical Foundation

Rolling Hash uses polynomial hashing: hash(s) = (s[0]×base^(n-1) + s[1]×base^(n-2) + ... + s[n-1]×base^0) mod prime. When sliding window right, remove leftmost character's contribution, multiply by base (shift all powers up), add new rightmost character. Formula: newHash = ((oldHash - s[left]×base^(n-1)) × base + s[right]) mod prime. The modulo operation keeps values bounded preventing overflow. Choose prime modulus (e.g., 10^9+7) and base (e.g., 256 for ASCII, 31 for strings) to minimize collisions. Time: O(m) initial hash + O(1) per roll. Must verify actual string match on hash collision (spurious hits).

Complexity

Time

Best

O(1) per operation

Average

O(1) per operation

Worst

O(1) per operation

Space

Required

O(1)

Applications

Industry Use

Rabin-Karp string matching

Plagiarism detection (document fingerprinting)

DNA sequence matching in bioinformatics

Data deduplication (chunking)

Version control (diff algorithms)

Malware signature detection

Content-defined chunking in backup systems

Use Cases

Pattern matching

Plagiarism detection

DNA sequence matching

Related Algorithms

Knuth-Morris-Pratt (KMP) Algorithm

An efficient string pattern matching algorithm that searches for occurrences of a 'word' within a 'text' by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin. Developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, it's one of the most important string algorithms with O(n+m) time complexity.

String Algorithms

Rabin-Karp Algorithm

A string-searching algorithm that uses hashing to find pattern(s) in a text. Developed by Michael O. Rabin and Richard M. Karp in 1987, it's particularly useful for multiple pattern search and plagiarism detection. Uses rolling hash for efficiency.

String Algorithms

Boyer-Moore Algorithm

One of the most efficient string searching algorithms in practice, using two heuristics: bad character rule and good suffix rule. Developed by Robert S. Boyer and J Strother Moore in 1977, it's the standard benchmark for practical string search, often outperforming other algorithms by skipping sections of text.

String Algorithms

Aho-Corasick Algorithm

A string-searching algorithm for locating elements of a finite set of strings (dictionary) within an input text. Invented by Alfred V. Aho and Margaret J. Corasick in 1975, it's a kind of dictionary-matching algorithm that simultaneously searches for all patterns in linear time, making it extremely efficient for multiple pattern matching.

String Algorithms