Huffman Coding
Huffman Coding is a lossless data compression algorithm that creates optimal prefix-free variable-length codes based on character frequencies. Developed by David A. Huffman in 1952 as a student at MIT, it uses a greedy approach to build a binary tree where frequent characters get shorter codes. This algorithm is fundamental in ZIP, JPEG, MP3, and many compression formats.
Visualization
Interactive visualization for Huffman Coding
Huffman Coding
• Optimal prefix-free encoding
• Greedy algorithm
• Used in ZIP, JPEG, MP3
Interactive visualization with step-by-step execution
Implementation
1class HuffmanNode {
2 char: string;
3 freq: number;
4 left: HuffmanNode | null = null;
5 right: HuffmanNode | null = null;
6
7 constructor(char: string, freq: number) {
8 this.char = char;
9 this.freq = freq;
10 }
11}
12
13function huffmanCoding(text: string): Map<string, string> {
14 const freq = new Map<string, number>();
15 for (const char of text) {
16 freq.set(char, (freq.get(char) || 0) + 1);
17 }
18
19 const heap: HuffmanNode[] = [];
20 for (const [char, count] of freq) {
21 heap.push(new HuffmanNode(char, count));
22 }
23 heap.sort((a, b) => a.freq - b.freq);
24
25 while (heap.length > 1) {
26 const left = heap.shift()!;
27 const right = heap.shift()!;
28 const parent = new HuffmanNode('', left.freq + right.freq);
29 parent.left = left;
30 parent.right = right;
31
32 let i = 0;
33 while (i < heap.length && heap[i].freq < parent.freq) i++;
34 heap.splice(i, 0, parent);
35 }
36
37 const codes = new Map<string, string>();
38 const buildCodes = (node: HuffmanNode | null, code: string) => {
39 if (!node) return;
40 if (node.char) codes.set(node.char, code);
41 buildCodes(node.left, code + '0');
42 buildCodes(node.right, code + '1');
43 };
44 buildCodes(heap[0], '');
45 return codes;
46}Deep Dive
Theoretical Foundation
Huffman Coding builds a binary tree bottom-up using a greedy strategy. Characters with higher frequencies are placed closer to the root (shorter codes), while rare characters are deeper (longer codes). The algorithm uses a min-heap to repeatedly merge the two nodes with smallest frequencies. No code is a prefix of another (prefix-free property), enabling unambiguous decoding. Optimal among all prefix-free codes. Time: O(n log n) for building tree, Space: O(n) for tree and codes.
Complexity
Time
O(n log n)
O(n log n)
O(n log n)
Space
O(n)
Applications
Industry Use
ZIP/GZIP file compression
JPEG image compression (combined with DCT)
MP3 audio compression
DEFLATE algorithm (ZIP, PNG)
Fax machine compression
Text file compression
Network data transmission
Use Cases
Related Algorithms
Activity Selection Problem
Select the maximum number of non-overlapping activities from a set, where each activity has a start and end time. This classic greedy algorithm demonstrates the greedy choice property: always selecting the activity that finishes earliest leaves the most room for remaining activities. Used in scheduling problems, resource allocation, and interval management. Achieves optimal solution with O(n log n) time complexity.
Fractional Knapsack Problem
Given items with values and weights, and a knapsack with capacity, select items (or fractions thereof) to maximize total value. Unlike the 0/1 knapsack where items must be taken whole, the fractional knapsack allows taking fractions of items. The greedy approach of taking items in order of value-to-weight ratio yields the optimal solution in O(n log n) time. This demonstrates when greedy algorithms work vs. when dynamic programming is needed.
Job Sequencing with Deadlines
Schedule jobs with deadlines and profits to maximize total profit. Each job takes 1 unit time and has a deadline and profit. The greedy strategy is to sort jobs by profit (descending) and greedily schedule each job as late as possible before its deadline. This maximizes profit while respecting constraints. Used in task scheduling, CPU job management, and project planning.
Minimum Platforms Problem
Find the minimum number of platforms required for a railway station given arrival and departure times of trains. No train should wait. The elegant solution uses the sorted merge technique: sort arrivals and departures separately, then use two pointers to track platforms needed at each moment. This greedy approach simulates the timeline efficiently in O(n log n) time, commonly asked in interviews.