Knuth-Morris-Pratt (KMP) Algorithm

String Algorithms

O(n + m) time, O(m) space

Advanced

An efficient string pattern matching algorithm that searches for occurrences of a 'word' within a 'text' by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin. Developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, it's one of the most important string algorithms with O(n+m) time complexity.

Prerequisites:

Strings

Arrays

Prefix concepts

Visualization

Interactive visualization for Knuth-Morris-Pratt (KMP) Algorithm

KMP String Matching Algorithm

Text:

Pattern:

Text:

Pattern:

• Time: O(n + m) where n = text length, m = pattern length

• Space: O(m) for LPS array

• KMP avoids re-scanning matched characters using LPS array

Interactive visualization with step-by-step execution

Implementation

Language:

1function kmpSearch(text: string, pattern: string): number[] {
2  const result: number[] = [];
3  const lps = buildLPS(pattern);
4  let i = 0; // text index
5  let j = 0; // pattern index
6  
7  while (i < text.length) {
8    if (text[i] === pattern[j]) {
9      i++;
10      j++;
11    }
12    
13    if (j === pattern.length) {
14      result.push(i - j);
15      j = lps[j - 1];
16    } else if (i < text.length && text[i] !== pattern[j]) {
17      if (j !== 0) {
18        j = lps[j - 1];
19      } else {
20        i++;
21      }
22    }
23  }
24  
25  return result;
26}
27
28function buildLPS(pattern: string): number[] {
29  const lps: number[] = Array(pattern.length).fill(0);
30  let len = 0;
31  let i = 1;
32  
33  while (i < pattern.length) {
34    if (pattern[i] === pattern[len]) {
35      len++;
36      lps[i] = len;
37      i++;
38    } else {
39      if (len !== 0) {
40        len = lps[len - 1];
41      } else {
42        lps[i] = 0;
43        i++;
44      }
45    }
46  }
47  
48  return lps;
49}

Deep Dive

Theoretical Foundation

KMP (Knuth-Morris-Pratt) builds a Longest Proper Prefix which is also Suffix (LPS) array during preprocessing. The LPS array stores, for each position i in the pattern, the length of the longest proper prefix that is also a suffix of pattern[0..i]. This crucial data structure allows the algorithm to avoid re-examining characters in the text when a mismatch occurs. Instead of backtracking in the text (like naive approach), KMP uses LPS to determine how far to shift the pattern. For example, if pattern='ABABC' and we match 'ABAB' then mismatch at 'C', we know 'AB' at the end matches 'AB' at the start, so we can continue matching from position 2 in pattern without moving back in text. This eliminates all backtracking, achieving O(n+m) time where n is text length and m is pattern length. The preprocessing itself takes O(m) time using a clever self-matching approach.

Complexity

Time

Best

O(n + m)

Average

O(n + m)

Worst

O(n + m)

Space

Required

O(m)

Applications

Industry Use

Text editors for Find/Replace functionality (VS Code, Sublime)

DNA sequence matching in bioinformatics

Plagiarism detection systems

Network intrusion detection (packet inspection)

Log file analysis and pattern monitoring

Substring search in databases (PostgreSQL, MySQL)

Compiler lexical analysis and tokenization

Data deduplication systems

Virus signature matching in antivirus software

Use Cases

Text search

DNA matching

Pattern recognition

Related Algorithms

Rabin-Karp Algorithm

A string-searching algorithm that uses hashing to find pattern(s) in a text. Developed by Michael O. Rabin and Richard M. Karp in 1987, it's particularly useful for multiple pattern search and plagiarism detection. Uses rolling hash for efficiency.

String Algorithms

Boyer-Moore Algorithm

One of the most efficient string searching algorithms in practice, using two heuristics: bad character rule and good suffix rule. Developed by Robert S. Boyer and J Strother Moore in 1977, it's the standard benchmark for practical string search, often outperforming other algorithms by skipping sections of text.

String Algorithms

Aho-Corasick Algorithm

A string-searching algorithm for locating elements of a finite set of strings (dictionary) within an input text. Invented by Alfred V. Aho and Margaret J. Corasick in 1975, it's a kind of dictionary-matching algorithm that simultaneously searches for all patterns in linear time, making it extremely efficient for multiple pattern matching.

String Algorithms

Manacher's Algorithm

An optimal algorithm for finding the longest palindromic substring in linear time. Invented by Glenn K. Manacher in 1975, it cleverly avoids redundant comparisons by utilizing previously computed palindrome information, achieving O(n) time complexity which is optimal for this problem.

String Algorithms