Why We Need a Shorthand for "Add Everything Up"
In machine learning, you're almost always doing something to every example in your dataset, or to every parameter in your model. If you have 10,000 training examples and you want to write "add up the error for each one," you cannot write 10,000 plus signs. You need a shorthand.
That shorthand is the (capital sigma) notation - the mathematical equivalent of a for-loop.
Reading the Sigma Symbol
Here's the full notation:
- sum of x_i for i from 1 to n - add x_1 plus x_2 plus ... plus x_n
The pieces:
- Σ - "add up what follows"
- i=1 below the sigma - "start the counter at 1"
- n above the sigma - "stop when i reaches n"
- xᵢ to the right - "the thing you're adding each time"
It's literally a recipe for a loop: start at the bottom, go to the top, add the expression on the right at every step.
Interactive example
Sigma notation unpacker - enter a summation and watch it expand step by step
Coming soon
Worked Examples
Example 1: when
Example 2: (the thing being added is just i itself)
Example 3: (square each value of i first, then add)
The expression to the right of Σ can be as complex as you like - you evaluate it at each value of i and sum the results.
The Mean as a Summation
Here's a formula you already know, dressed up in sigma notation:
- x-bar - the mean (average) of the x values
- the number of values
That's just: "add up all the values, then divide by n." Which is exactly how you compute an average. In ML, your loss function is usually an average over your training examples. Sigma notation lets you write that crisply.
Nested Sums and Shorthand
Sometimes you'll drop the limits when they're obvious from context. (without the i=1 and n) means "sum over all values of i."
You'll also see like - "for each i, for each j, add ." Think of a grid: you're summing every cell in a matrix.
Product Notation: Sigma's Multiplication Twin
The capital (pi) does for multiplication what Σ does for addition:
- product of a_i for i from 1 to n - multiply a_1 times a_2 times ... times a_n
Π shows up in probability, where you often multiply likelihoods together. If five independent events have probabilities , the probability they all happen is:
Where Sums Appear in Machine Learning
Sigma notation is everywhere once you start reading ML papers. Here are the four places you'll see it most:
1. Loss functions. The :
- number of training examples
- true label for example i
- predicted label for example i
2. The dot product. Every neuron computes a weighted sum of its inputs:
- the weighted sum (pre-activation)
- weight for input i
- i-th input feature
This is the dot product - the most computationally expensive operation in deep learning, and it's just sigma notation.
3. Softmax denominator. The function turns raw scores into probabilities. The denominator:
- raw score for class j
- total number of classes
"Sum of for each class j" - ensures all output probabilities add up to 1.
4. Expected value. For a discrete random variable X with probability :
- the expected value of X - a weighted average of possible values, weighted by their probabilities
"For each possible value x, multiply x by its probability, then sum." Used in reinforcement learning, Bayesian inference, and every probabilistic ML method.
The Mental Model: Sigma = For-Loop
Whenever you see Σ, mentally translate it as:
# Explicit for-loop (matches the math directly)
total = 0
for i in range(1, n + 1): # i goes from 1 to n
total += expression_at_i
# NumPy vectorized version (what you'll actually write)
import numpy as np
x = np.array([2, 5, 3]) # x₁=2, x₂=5, x₃=3
total = np.sum(x) # → 10 (Σxᵢ)
mean = np.mean(x) # → 10/3 ≈ 3.33 (x̄ = (1/n)Σxᵢ)
# Weighted sum (dot product) — the fundamental neuron operation
w = np.array([0.4, -0.2, 0.9])
z = np.dot(w, x) # → Σwᵢxᵢ = 0.4·2 + (-0.2)·5 + 0.9·3 = 2.3
That's it. Same for Π — it's a multiply-loop instead of an add-loop.
Don't let the symbols intimidate you. Behind every Σ is just a list of numbers being added up.