Huffman Compression
Huffman Coding is an optimal lossless data compression algorithm that assigns variable-length codes to characters based on their frequencies. Invented by David Huffman in 1952 while he was a PhD student at MIT, it remains one of the most widely used compression techniques. The algorithm creates a binary tree where frequent characters get short codes (e.g., '0') and rare characters get longer codes (e.g., '10110'), achieving optimal compression for symbol-by-symbol encoding. Used in ZIP, JPEG, MP3, and countless other formats.
Visualization
Interactive visualization for Huffman Compression
Huffman Coding
• Optimal prefix-free encoding
• Greedy algorithm
• Used in ZIP, JPEG, MP3
Interactive visualization with step-by-step execution
Implementation
1class HuffmanNode {
2 constructor(
3 public char: string | null,
4 public freq: number,
5 public left: HuffmanNode | null = null,
6 public right: HuffmanNode | null = null
7 ) {}
8
9 isLeaf(): boolean {
10 return this.left === null && this.right === null;
11 }
12}
13
14class HuffmanCoding {
15 private root: HuffmanNode | null = null;
16 private codes: Map<string, string> = new Map();
17
18 private buildFrequencyMap(text: string): Map<string, number> {
19 const freq = new Map<string, number>();
20 for (const char of text) {
21 freq.set(char, (freq.get(char) || 0) + 1);
22 }
23 return freq;
24 }
25
26 private buildTree(freq: Map<string, number>): HuffmanNode {
27 const heap: HuffmanNode[] = [];
28
29 // Create leaf nodes
30 for (const [char, frequency] of freq) {
31 heap.push(new HuffmanNode(char, frequency));
32 }
33
34 // Build heap
35 heap.sort((a, b) => a.freq - b.freq);
36
37 // Build tree
38 while (heap.length > 1) {
39 const left = heap.shift()!;
40 const right = heap.shift()!;
41
42 const parent = new HuffmanNode(
43 null,
44 left.freq + right.freq,
45 left,
46 right
47 );
48
49 // Insert and maintain heap property
50 let inserted = false;
51 for (let i = 0; i < heap.length; i++) {
52 if (parent.freq < heap[i].freq) {
53 heap.splice(i, 0, parent);
54 inserted = true;
55 break;
56 }
57 }
58 if (!inserted) heap.push(parent);
59 }
60
61 return heap[0];
62 }
63
64 private generateCodes(node: HuffmanNode | null, code: string = ''): void {
65 if (!node) return;
66
67 if (node.isLeaf() && node.char !== null) {
68 this.codes.set(node.char, code || '0'); // Single char edge case
69 return;
70 }
71
72 this.generateCodes(node.left, code + '0');
73 this.generateCodes(node.right, code + '1');
74 }
75
76 encode(text: string): { encoded: string; tree: HuffmanNode } {
77 if (!text) return { encoded: '', tree: new HuffmanNode(null, 0) };
78
79 const freq = this.buildFrequencyMap(text);
80 this.root = this.buildTree(freq);
81 this.codes.clear();
82 this.generateCodes(this.root);
83
84 let encoded = '';
85 for (const char of text) {
86 encoded += this.codes.get(char) || '';
87 }
88
89 return { encoded, tree: this.root };
90 }
91
92 decode(encoded: string, tree: HuffmanNode): string {
93 if (!encoded) return '';
94
95 let decoded = '';
96 let current = tree;
97
98 for (const bit of encoded) {
99 current = bit === '0' ? current.left! : current.right!;
100
101 if (current.isLeaf()) {
102 decoded += current.char;
103 current = tree;
104 }
105 }
106
107 return decoded;
108 }
109
110 getCompressionRatio(original: string, encoded: string): number {
111 const originalBits = original.length * 8; // ASCII
112 return (1 - encoded.length / originalBits) * 100;
113 }
114
115 getCodes(): Map<string, string> {
116 return new Map(this.codes);
117 }
118}Deep Dive
Theoretical Foundation
Huffman Coding builds a binary tree bottom-up using a greedy algorithm. Start with leaf nodes for each character and their frequencies. Repeatedly merge the two nodes with lowest frequencies into a parent node with combined frequency, inserting back into priority queue. This process continues until one root node remains. The tree structure guarantees prefix-free codes: no code is a prefix of another, allowing unambiguous decoding. Traversing left adds '0', right adds '1' to the code. The algorithm achieves the entropy limit H(X) for symbol-by-symbol coding, where H(X) = -Σ p(x)log₂p(x). Time: O(n log n) for building tree (heap operations), O(n) for encoding/decoding. Space: O(n) for tree.
Complexity
Time
O(n log n)
O(n log n)
O(n log n)
Space
O(n)
Applications
Industry Use
ZIP compression
JPEG image compression
MP3 audio compression
Network protocols
File compression utilities
Use Cases
Related Algorithms
A* Search Algorithm
Informed search algorithm combining best-first search with Dijkstra's algorithm using heuristics. Widely used in pathfinding and graph traversal, A* is optimal and complete when using admissible heuristic. Used in games, GPS navigation, and robotics. Invented by Peter Hart, Nils Nilsson, and Bertram Raphael in 1968.
Convex Hull (Graham Scan)
Find smallest convex polygon containing all points. Graham Scan invented by Ronald Graham in 1972, runs in O(n log n). Essential in computational geometry, computer graphics, and pattern recognition.
Line Segment Intersection
Determine if two line segments intersect. Fundamental geometric primitive used in graphics, CAD, GIS. Uses orientation and collinearity tests.
Caesar Cipher
The Caesar Cipher is one of the oldest and simplest encryption techniques, named after Julius Caesar who used it to protect military messages around 100 BC. It works by shifting each letter in the plaintext by a fixed number of positions down the alphabet. For example, with a shift of 3, A becomes D, B becomes E, and so on. Despite being used for over 2000 years, it's extremely weak by modern standards with only 25 possible keys, making it trivially breakable by brute force or frequency analysis.