Skip to content
Math Foundation Probability & Statistics
Lesson 1 ⏱ 10 min

What is probability?

Video coming soon

Probability Basics: From Coin Flips to Classifiers

Events, outcomes, the complement rule, joint and conditional probability. Why every classifier output is a conditional probability.

⏱ ~6 min

What Is Probability?

The is a number between 0 and 1 that measures how likely something is to happen.

Every ML model is, at its core, a probability machine. A classifier does not say "this is a cat" — it says "there is an 87% chance this is a cat." A language model assigns probabilities to every possible next word. Before you can interpret, train, or evaluate any of these models, you need a firm grasp of what probability actually means.

  • When P=0P = 0: impossible. P(rolling 7 on a standard die)=0P(\text{rolling 7 on a standard die}) = 0.
  • When P=1P = 1: certain. P(rolling6)=1P(\text{rolling} \leq 6) = 1.
  • Everything else is in between. P(rolling a 3)=1/60.167P(\text{rolling a 3}) = 1/6 \approx 0.167.

Think of probability as a fraction: how many of the possible outcomes are the favorable one? For a fair coin, 1 of 2 outcomes is heads, so P(heads)=1/2P(\text{heads}) = 1/2.

P(A)[0,1]P(Ω)=1P(A) \in [0, 1] \qquad P(\Omega) = 1
P(A)P(A)
probability of event A - always between 0 and 1

The is the sample space - the complete set of everything that could happen. Its probability is always 1.

The Complement Rule

P(A)=1P(A)P(\overline{A}) = 1 - P(A)
AA
event of interest
P(A)P(\overline{A})
probability that A does NOT happen

If P(rain)=0.3P(\text{rain}) = 0.3, then P(no rain)=0.7P(\text{no rain}) = 0.7. The two must sum to 1 - something either happens or it does not.

This is more useful than it sounds. Sometimes P(not A)P(\text{not A}) is far easier to compute. "Probability of at least one head in 10 flips" sounds complicated until you flip it: 1P(zero heads)=1(0.5)100.9991 - P(\text{zero heads}) = 1 - (0.5)^{10} \approx 0.999.

Interactive example

Complement rule visualizer - adjust P(A) and see P(not A) update

Coming soon

Joint Probability: AND

Here, P(A and B)P(A \text{ and } B) is the probability that both AA and BB happen simultaneously.

When AA and BB are independent (knowing one tells you nothing about the other):

P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B)
P(A)P(A)
probability of A
P(B)P(B)
probability of B

Example: P(heads on flip 1 AND heads on flip 2)=0.5×0.5=0.25P(\text{heads on flip 1 AND heads on flip 2}) = 0.5 \times 0.5 = 0.25.

When events are not independent - consecutive rainy days are correlated - this formula breaks down. For dependent events:

P(AB)=P(A)P(BA)P(A \cap B) = P(A) \cdot P(B \mid A)
P(BA)P(B|A)
probability of B given A has occurred

This brings us to the most important concept in probability for ML.

Conditional Probability: GIVEN

The P(AB)P(A \mid B) - read "probability of A given B" - means: IF BB has already happened, what is the probability of AA?

P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}
P(AB)P(A \cap B)
probability that both A and B occur
P(B)P(B)
probability that B occurs

You restrict to the world where BB happened (divide by P(B)P(B)), then look at what fraction of that world also has AA.

Example: P(rolling 4rolling even)P(\text{rolling 4} \mid \text{rolling even})?

  • We know it is even, so only 2,4,6{2, 4, 6} are possible (sample space shrinks).
  • Of those three, only 4{4} gives us 4.
  • For example: P(4even)=1/3P(\text{4} \mid \text{even}) = 1/3.

Compare to P(rolling 4)=1/6P(\text{rolling 4}) = 1/6. Learning that the roll is even doubled the probability, because it cut the sample space in half.

The Addition Rule

For events that can overlap:

P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
P(AB)P(A \cup B)
probability that A or B or both occur

Subtract the intersection to avoid counting it twice. If AA and BB are mutually exclusive (cannot both happen), P(AB)=0P(A \cap B) = 0 and the formula simplifies to P(A)+P(B)P(A) + P(B).

Independence

Events AA and BB are when knowing BB gives you no information about AA:

AB    P(AB)=P(A)A \perp B \iff P(A \mid B) = P(A)
P(AB)P(A|B)
probability of A given B
P(A)P(A)
unconditional probability of A

Coin flips are independent - each flip starts fresh. But tomorrow's weather is not independent of today's weather. In ML, the i.i.d. (independent and identically distributed) assumption says that training examples are independent of each other. This assumption drives much of ML theory, and it is often violated in practice (think: time series, correlated samples, users in a recommendation system). Knowing when it is violated helps you anticipate failure modes.

# Computing probabilities from data
outcomes = ['H', 'T', 'H', 'H', 'T', 'H', 'T', 'T', 'H', 'H']

# Empirical probability (count / total)
p_heads = outcomes.count('H') / len(outcomes)
print(f"P(H) = {p_heads:.2f}")   # → 0.60

# Conditional probability: P(A|B) = P(A ∩ B) / P(B)
# Example: given a patient tests positive, what's the probability they're sick?
p_positive_given_sick    = 0.95
p_sick                   = 0.01
p_positive_given_healthy = 0.05

p_positive = p_positive_given_sick * p_sick + p_positive_given_healthy * (1 - p_sick)
p_sick_given_positive = (p_positive_given_sick * p_sick) / p_positive
print(f"P(sick | positive) = {p_sick_given_positive:.4f}")   # → ~0.16 (Bayes!)

# Independence check: P(A ∩ B) == P(A) * P(B)?
p_a, p_b = 0.3, 0.5
p_a_and_b_if_independent = p_a * p_b   # 0.15

Interactive example

Conditional probability explorer - see how restricting the sample space changes probabilities

Coming soon

Quiz

1 / 3

If P(rain today) = 0.3, what is P(no rain today)?