1.b) Explain Probability Distribution with example.
Answer:
Probability Distributions:
- Probability Distributions are foundations of statistical models.
- Probability distributions are fundamental concepts in statistics and probability theory. In the context of data science, understanding probability distributions is crucial for modeling and analyzing data.
- Probability Distribution: A probability distribution describes the likelihood of each possible outcome of a random variable. It assigns probabilities to different values that the variable can take.
- Example – Normal (Gaussian) Distribution, Poisson Distribution, Weibull Distribution, Gamma Distribution, Exponential Distribution. Natural Processes tend to generate measurements whose empirical shape could be approximated by mathematical functions with a few parameters that could be estimated from the data. Not all processes generate data that looks like a named distribution, but many do. These functions can be as building blocks of our models.
- It’s beyond the scope of the book to go into each of the distributions in detail, but we provide them in Figure below as an illustration of the various common shapes, and to remind you that they only have names because someone observed them enough times to think they deserved names. There is actually an infinite number of possible distributions. They are to be interpreted as assigning a probability to a subset of possible outcomes and have corresponding functions. For example, the normal distribution is written as:
- Data tends to be around a central value with no bias on left or right. It is a Symmetric Distribution appearing as a Bell-shaped curve distribution. Mean and Median Controls where the distribution is centered. σ controls how spread out the distribution is. This is the general function form. Specific real-world phenomenon have actual numbers as value which can be estimated from data.
- A random variable denoted by x or y can be assumed to have a corresponding probability distribution p(x) which maps to a positive real number. In order to be a probability density function, we’re restricted to the set of functions such that if we integrate p(x) to get the area under the curve, it is 1, so it can be interpreted as probability.