Principal Component Analysis (PCA)
Dimensionality reduction technique transforming data to new coordinate system where greatest variance lies on first coordinates (principal components). Unsupervised, linear transformation preserving maximum variance.
Visualization
Interactive visualization for Principal Component Analysis (PCA)
PCA:
- • Principal Component Analysis
- • Dimensionality reduction
Interactive visualization with step-by-step execution
Implementation
1class PCA {
2 private components: number[][] = [];
3 private mean: number[] = [];
4 private explainedVariance: number[] = [];
5
6 constructor(private nComponents: number = 2) {}
7
8 fit(X: number[][]): void {
9 const n = X.length;
10 const m = X[0].length;
11
12 // Calculate mean
13 this.mean = new Array(m).fill(0);
14 for (const row of X) {
15 for (let j = 0; j < m; j++) {
16 this.mean[j] += row[j] / n;
17 }
18 }
19
20 // Center data
21 const X_centered = X.map(row => row.map((val, j) => val - this.mean[j]));
22
23 // Compute covariance matrix
24 const cov = this.computeCovariance(X_centered);
25
26 // Eigen decomposition (simplified - in practice use library)
27 const { eigenvectors, eigenvalues } = this.eigenDecomposition(cov);
28
29 // Sort by eigenvalues
30 const indices = eigenvalues
31 .map((val, idx) => ({ val, idx }))
32 .sort((a, b) => b.val - a.val)
33 .slice(0, this.nComponents)
34 .map(item => item.idx);
35
36 this.components = indices.map(i => eigenvectors[i]);
37 this.explainedVariance = indices.map(i => eigenvalues[i]);
38 }
39
40 transform(X: number[][]): number[][] {
41 // Center data
42 const X_centered = X.map(row => row.map((val, j) => val - this.mean[j]));
43
44 // Project onto principal components
45 return X_centered.map(row =>
46 this.components.map(comp =>
47 row.reduce((sum, val, i) => sum + val * comp[i], 0)
48 )
49 );
50 }
51
52 fitTransform(X: number[][]): number[][] {
53 this.fit(X);
54 return this.transform(X);
55 }
56
57 private computeCovariance(X: number[][]): number[][] {
58 const n = X.length;
59 const m = X[0].length;
60 const cov: number[][] = Array(m).fill(0).map(() => Array(m).fill(0));
61
62 for (let i = 0; i < m; i++) {
63 for (let j = i; j < m; j++) {
64 let sum = 0;
65 for (const row of X) {
66 sum += row[i] * row[j];
67 }
68 cov[i][j] = cov[j][i] = sum / (n - 1);
69 }
70 }
71
72 return cov;
73 }
74
75 private eigenDecomposition(matrix: number[][]): {
76 eigenvectors: number[][];
77 eigenvalues: number[];
78 } {
79 // Placeholder - use numeric library in production
80 // Power iteration method for dominant eigenvector
81 return { eigenvectors: [], eigenvalues: [] };
82 }
83
84 getExplainedVarianceRatio(): number[] {
85 const total = this.explainedVariance.reduce((a, b) => a + b, 0);
86 return this.explainedVariance.map(v => v / total);
87 }
88}Deep Dive
Theoretical Foundation
Finds orthogonal directions of maximum variance in data. Eigenvalue decomposition of covariance matrix or SVD. Components ordered by explained variance. Reduces dimensions while retaining most information. Assumes linear relationships, sensitive to scaling.
Complexity
Time
O(nm²) or O(n²m)
O(min(n²m, nm²))
O(min(n²m, nm²))
Space
O(m²)
Applications
Industry Use
Image compression and processing
Data visualization and exploration
Feature extraction for machine learning
Genomics and bioinformatics
Finance (portfolio optimization)
Face recognition systems
Preprocessing for neural networks
Use Cases
Related Algorithms
K-Nearest Neighbors (KNN)
Simple, instance-based learning algorithm that classifies new data points based on k closest training examples. Non-parametric, lazy learning method used for classification and regression.
Linear Regression
Fundamental supervised learning algorithm modeling relationship between dependent variable and independent variables using linear equation. Foundational for predictive modeling and statistics.
Logistic Regression
Binary classification algorithm using sigmoid function to model probability of class membership. Despite name, it's classification not regression. Foundation for neural networks.
Decision Tree
Tree-based model making decisions through sequence of if-else questions. Splits data based on feature values to create hierarchical structure. Interpretable and handles non-linear relationships.