Naive Bayes Classifier

Machine Learning

O(nm) training time, O(nm×k) space

Beginner

Probabilistic classifier based on Bayes theorem with strong independence assumptions. Fast, simple, effective for text classification. 'Naive' assumes features are conditionally independent given class.

Prerequisites:

Bayes theorem

Conditional probability

Maximum likelihood estimation

Visualization

Interactive visualization for Naive Bayes Classifier

Naive Bayes:

• Probabilistic classifier
• Bayes theorem with independence

Interactive visualization with step-by-step execution

Implementation

Language:

1class NaiveBayes {
2  private classPriors: Map<number, number> = new Map();
3  private featureProbabilities: Map<number, Map<number, Map<any, number>>> = new Map();
4  
5  fit(X: any[][], y: number[]): void {
6    const n = X.length;
7    const classes = [...new Set(y)];
8    
9    // Calculate class priors P(C)
10    for (const c of classes) {
11      const count = y.filter(label => label === c).length;
12      this.classPriors.set(c, count / n);
13    }
14    
15    // Calculate feature probabilities P(x|C)
16    for (const c of classes) {
17      this.featureProbabilities.set(c, new Map());
18      const classIndices = y.map((label, i) => label === c ? i : -1).filter(i => i >= 0);
19      
20      for (let featureIdx = 0; featureIdx < X[0].length; featureIdx++) {
21        const featureMap = new Map<any, number>();
22        const featureValues = classIndices.map(i => X[i][featureIdx]);
23        
24        for (const value of featureValues) {
25          featureMap.set(value, (featureMap.get(value) || 0) + 1);
26        }
27        
28        // Convert to probabilities with Laplace smoothing
29        const total = featureValues.length;
30        const uniqueValues = new Set(X.map(row => row[featureIdx])).size;
31        
32        for (const [value, count] of featureMap) {
33          featureMap.set(value, (count + 1) / (total + uniqueValues));
34        }
35        
36        this.featureProbabilities.get(c)!.set(featureIdx, featureMap);
37      }
38    }
39  }
40  
41  predict(X: any[][]): number[] {
42    return X.map(x => this.predictOne(x));
43  }
44  
45  private predictOne(x: any[]): number {
46    let maxProb = -Infinity;
47    let bestClass = 0;
48    
49    for (const [c, prior] of this.classPriors) {
50      let logProb = Math.log(prior);
51      
52      for (let i = 0; i < x.length; i++) {
53        const featureMap = this.featureProbabilities.get(c)!.get(i)!;
54        const prob = featureMap.get(x[i]) || 1e-6; // Smoothing for unseen values
55        logProb += Math.log(prob);
56      }
57      
58      if (logProb > maxProb) {
59        maxProb = logProb;
60        bestClass = c;
61      }
62    }
63    
64    return bestClass;
65  }
66}

Deep Dive

Theoretical Foundation

Applies Bayes theorem: P(C|X) = P(X|C)P(C)/P(X). Assumes features independent: P(X|C) = ∏P(xᵢ|C). Variants: Gaussian (continuous), Multinomial (counts), Bernoulli (binary). Despite naive assumption, works well in practice, especially with high-dimensional sparse data.

Complexity

Time

Best

O(nm) train, O(m) predict

Average

O(nm) train, O(m) predict

Worst

O(nm) train, O(m) predict

Space

Required

O(nm×k)

Applications

Industry Use

Email spam filtering

Text classification and sentiment analysis

Medical diagnosis systems

Real-time predictions

Recommendation systems

News categorization

Social media content filtering

Use Cases

Spam filtering

Text classification

Sentiment analysis

Medical diagnosis

Related Algorithms

K-Nearest Neighbors (KNN)

Simple, instance-based learning algorithm that classifies new data points based on k closest training examples. Non-parametric, lazy learning method used for classification and regression.

Machine Learning

Linear Regression

Fundamental supervised learning algorithm modeling relationship between dependent variable and independent variables using linear equation. Foundational for predictive modeling and statistics.

Machine Learning

Logistic Regression

Binary classification algorithm using sigmoid function to model probability of class membership. Despite name, it's classification not regression. Foundation for neural networks.

Machine Learning

Decision Tree

Tree-based model making decisions through sequence of if-else questions. Splits data based on feature values to create hierarchical structure. Interpretable and handles non-linear relationships.

Machine Learning