Importance of Probability and Statistics in Machine Learning
Statistics and probability are core foundations of machine learning. Just like linear algebra is the math of data, statistics is the science of analyzing data, and probability helps model uncertainty in data.
Why are they important?
- Statistics helps us analyze, summarize, and understand data distributions.
- Probability allows us to model random events, build hypotheses, and evaluate machine learning models.
- Every machine learning model assumes some kind of underlying probability distribution in the dataset.
- Concepts like hypothesis testing, significance, sampling, and model evaluation come from statistics and probability theory.
Types of Probability Distributions
Probability distributions describe how probabilities are assigned to values of a random variable. They are divided into:
1. Continuous Probability Distributions
Used when the variable can take any value within a range.
A. Normal Distribution (Gaussian)
- Most common and important continuous distribution in ML.
- Shaped like a bell curve, symmetric about the mean.
- Described by two parameters: mean (μ) and standard deviation (σ).
PDF (Probability Density Function):

- Mean = Median = Mode
- Useful in modeling real-world data like height, marks, and weight.
- Z-Score is used to normalize data:

- Normality Check: Done using QQ plot, which compares the sample distribution with the normal distribution.
B. Rectangular Distribution (Uniform Distribution)
- All values in a range [a, b] are equally likely.
PDF:

- Used when outcomes are evenly distributed (e.g., rolling a fair die).
C. Exponential Distribution
- Describes the time between events in a Poisson process.
- A special case of the Gamma distribution.
PDF:

- λ is the rate parameter.
- Mean = Standard Deviation = 1/λ
2. Discrete Probability Distributions
Used when the variable takes specific separate values.
A. Binomial Distribution
- Models number of successes in n independent trials.
- Each trial is a Bernoulli trial (success/failure).
PDF:

- Mean (μ) = np
- Variance (σ²) = np(1 – p)
- Example: Tossing a coin 10 times and counting heads.
B. Poisson Distribution
- Models the number of events in a fixed interval of time or space.
PDF:

- Mean (μ) = λ
- Variance (σ²) = λ
- Example: Number of emails received per hour.
C. Bernoulli Distribution
- Models a single experiment with only 2 outcomes: success (1) and failure (0).
PMF:

- Mean = p
- Variance = p(1 – p)