K-Means Clustering
Unsupervised learning algorithm partitioning n observations into k clusters. Each observation belongs to cluster with nearest mean. Widely used for data segmentation and pattern discovery.
Visualization
Interactive visualization for K-Means Clustering
Interactive visualization with step-by-step execution
Implementation
1class KMeans {
2 private centroids: number[][] = [];
3 private labels: number[] = [];
4
5 constructor(private k: number, private maxIters: number = 100) {}
6
7 private euclideanDistance(a: number[], b: number[]): number {
8 return Math.sqrt(a.reduce((sum, val, i) => sum + (val - b[i]) ** 2, 0));
9 }
10
11 private initializeCentroids(X: number[][]): void {
12 // K-Means++ initialization
13 this.centroids = [];
14 this.centroids.push(X[Math.floor(Math.random() * X.length)]);
15
16 for (let i = 1; i < this.k; i++) {
17 const distances = X.map(x => {
18 const minDist = Math.min(...this.centroids.map(c => this.euclideanDistance(x, c)));
19 return minDist * minDist;
20 });
21
22 const sum = distances.reduce((a, b) => a + b, 0);
23 let r = Math.random() * sum;
24
25 for (let j = 0; j < X.length; j++) {
26 r -= distances[j];
27 if (r <= 0) {
28 this.centroids.push(X[j]);
29 break;
30 }
31 }
32 }
33 }
34
35 fit(X: number[][]): void {
36 this.initializeCentroids(X);
37
38 for (let iter = 0; iter < this.maxIters; iter++) {
39 // Assignment step
40 const newLabels = X.map(x => {
41 const distances = this.centroids.map(c => this.euclideanDistance(x, c));
42 return distances.indexOf(Math.min(...distances));
43 });
44
45 // Check convergence
46 if (JSON.stringify(newLabels) === JSON.stringify(this.labels)) break;
47 this.labels = newLabels;
48
49 // Update step
50 for (let k = 0; k < this.k; k++) {
51 const clusterPoints = X.filter((_, i) => this.labels[i] === k);
52 if (clusterPoints.length > 0) {
53 this.centroids[k] = clusterPoints[0].map((_, dim) =>
54 clusterPoints.reduce((sum, p) => sum + p[dim], 0) / clusterPoints.length
55 );
56 }
57 }
58 }
59 }
60
61 predict(X: number[][]): number[] {
62 return X.map(x => {
63 const distances = this.centroids.map(c => this.euclideanDistance(x, c));
64 return distances.indexOf(Math.min(...distances));
65 });
66 }
67
68 inertia(X: number[][]): number {
69 return X.reduce((sum, x, i) =>
70 sum + this.euclideanDistance(x, this.centroids[this.labels[i]]) ** 2, 0
71 );
72 }
73}Deep Dive
Theoretical Foundation
Iteratively assigns points to nearest centroid and updates centroids as mean of assigned points. Minimizes within-cluster sum of squares (WCSS). Sensitive to initialization (K-Means++), number of clusters k, and outliers. Assumes spherical clusters of similar size.
Complexity
Time
O(nki)
O(nki)
O(nki)
Space
O(n+k)
Applications
Industry Use
Customer segmentation for marketing
Image compression and quantization
Market research and analysis
Gene sequencing and bioinformatics
Document clustering and organization
Anomaly detection preprocessing
Color palette reduction in images
Use Cases
Related Algorithms
K-Nearest Neighbors (KNN)
Simple, instance-based learning algorithm that classifies new data points based on k closest training examples. Non-parametric, lazy learning method used for classification and regression.
Linear Regression
Fundamental supervised learning algorithm modeling relationship between dependent variable and independent variables using linear equation. Foundational for predictive modeling and statistics.
Logistic Regression
Binary classification algorithm using sigmoid function to model probability of class membership. Despite name, it's classification not regression. Foundation for neural networks.
Decision Tree
Tree-based model making decisions through sequence of if-else questions. Splits data based on feature values to create hierarchical structure. Interpretable and handles non-linear relationships.