Huffman Compression

Advanced Algorithms

O(n log n) time, O(n) space

Intermediate

Huffman Coding is an optimal lossless data compression algorithm that assigns variable-length codes to characters based on their frequencies. Invented by David Huffman in 1952 while he was a PhD student at MIT, it remains one of the most widely used compression techniques. The algorithm creates a binary tree where frequent characters get short codes (e.g., '0') and rare characters get longer codes (e.g., '10110'), achieving optimal compression for symbol-by-symbol encoding. Used in ZIP, JPEG, MP3, and countless other formats.

Prerequisites:

Binary Trees

Priority Queue

Greedy Algorithms

Visualization

Interactive visualization for Huffman Compression

Huffman Coding

Input Text:

• Optimal prefix-free encoding

• Greedy algorithm

• Used in ZIP, JPEG, MP3

Interactive visualization with step-by-step execution

Implementation

Language:

1class HuffmanNode {
2  constructor(
3    public char: string | null,
4    public freq: number,
5    public left: HuffmanNode | null = null,
6    public right: HuffmanNode | null = null
7  ) {}
8  
9  isLeaf(): boolean {
10    return this.left === null && this.right === null;
11  }
12}
13
14class HuffmanCoding {
15  private root: HuffmanNode | null = null;
16  private codes: Map<string, string> = new Map();
17  
18  private buildFrequencyMap(text: string): Map<string, number> {
19    const freq = new Map<string, number>();
20    for (const char of text) {
21      freq.set(char, (freq.get(char) || 0) + 1);
22    }
23    return freq;
24  }
25  
26  private buildTree(freq: Map<string, number>): HuffmanNode {
27    const heap: HuffmanNode[] = [];
28    
29    // Create leaf nodes
30    for (const [char, frequency] of freq) {
31      heap.push(new HuffmanNode(char, frequency));
32    }
33    
34    // Build heap
35    heap.sort((a, b) => a.freq - b.freq);
36    
37    // Build tree
38    while (heap.length > 1) {
39      const left = heap.shift()!;
40      const right = heap.shift()!;
41      
42      const parent = new HuffmanNode(
43        null,
44        left.freq + right.freq,
45        left,
46        right
47      );
48      
49      // Insert and maintain heap property
50      let inserted = false;
51      for (let i = 0; i < heap.length; i++) {
52        if (parent.freq < heap[i].freq) {
53          heap.splice(i, 0, parent);
54          inserted = true;
55          break;
56        }
57      }
58      if (!inserted) heap.push(parent);
59    }
60    
61    return heap[0];
62  }
63  
64  private generateCodes(node: HuffmanNode | null, code: string = ''): void {
65    if (!node) return;
66    
67    if (node.isLeaf() && node.char !== null) {
68      this.codes.set(node.char, code || '0'); // Single char edge case
69      return;
70    }
71    
72    this.generateCodes(node.left, code + '0');
73    this.generateCodes(node.right, code + '1');
74  }
75  
76  encode(text: string): { encoded: string; tree: HuffmanNode } {
77    if (!text) return { encoded: '', tree: new HuffmanNode(null, 0) };
78    
79    const freq = this.buildFrequencyMap(text);
80    this.root = this.buildTree(freq);
81    this.codes.clear();
82    this.generateCodes(this.root);
83    
84    let encoded = '';
85    for (const char of text) {
86      encoded += this.codes.get(char) || '';
87    }
88    
89    return { encoded, tree: this.root };
90  }
91  
92  decode(encoded: string, tree: HuffmanNode): string {
93    if (!encoded) return '';
94    
95    let decoded = '';
96    let current = tree;
97    
98    for (const bit of encoded) {
99      current = bit === '0' ? current.left! : current.right!;
100      
101      if (current.isLeaf()) {
102        decoded += current.char;
103        current = tree;
104      }
105    }
106    
107    return decoded;
108  }
109  
110  getCompressionRatio(original: string, encoded: string): number {
111    const originalBits = original.length * 8; // ASCII
112    return (1 - encoded.length / originalBits) * 100;
113  }
114  
115  getCodes(): Map<string, string> {
116    return new Map(this.codes);
117  }
118}

Deep Dive

Theoretical Foundation

Huffman Coding builds a binary tree bottom-up using a greedy algorithm. Start with leaf nodes for each character and their frequencies. Repeatedly merge the two nodes with lowest frequencies into a parent node with combined frequency, inserting back into priority queue. This process continues until one root node remains. The tree structure guarantees prefix-free codes: no code is a prefix of another, allowing unambiguous decoding. Traversing left adds '0', right adds '1' to the code. The algorithm achieves the entropy limit H(X) for symbol-by-symbol coding, where H(X) = -Σ p(x)log₂p(x). Time: O(n log n) for building tree (heap operations), O(n) for encoding/decoding. Space: O(n) for tree.

Complexity

Time

Best

O(n log n)

Average

O(n log n)

Worst

O(n log n)

Space

Required

O(n)

Applications

Industry Use

ZIP compression

JPEG image compression

MP3 audio compression

Network protocols

File compression utilities

Use Cases

File compression

Image compression

Data transmission

Network protocols

Related Algorithms

A* Search Algorithm

Informed search algorithm combining best-first search with Dijkstra's algorithm using heuristics. Widely used in pathfinding and graph traversal, A* is optimal and complete when using admissible heuristic. Used in games, GPS navigation, and robotics. Invented by Peter Hart, Nils Nilsson, and Bertram Raphael in 1968.

Advanced Algorithms

Convex Hull (Graham Scan)

Find smallest convex polygon containing all points. Graham Scan invented by Ronald Graham in 1972, runs in O(n log n). Essential in computational geometry, computer graphics, and pattern recognition.

Advanced Algorithms

Line Segment Intersection

Determine if two line segments intersect. Fundamental geometric primitive used in graphics, CAD, GIS. Uses orientation and collinearity tests.

Advanced Algorithms

Caesar Cipher

The Caesar Cipher is one of the oldest and simplest encryption techniques, named after Julius Caesar who used it to protect military messages around 100 BC. It works by shifting each letter in the plaintext by a fixed number of positions down the alphabet. For example, with a shift of 3, A becomes D, B becomes E, and so on. Despite being used for over 2000 years, it's extremely weak by modern standards with only 25 possible keys, making it trivially breakable by brute force or frequency analysis.

Advanced Algorithms