Skip to content
Math Foundation Functions & Notation
Lesson 3 ⏱ 8 min

Sums and products (Σ and Π)

Video coming soon

Summation Notation - Math's For-Loop

Introduces sigma and pi notation as shorthand for loops, works through concrete examples, and shows where summation appears in loss functions, dot products, and softmax.

⏱ ~5 min

🧮

Quick refresher

For-loops and accumulation

A for-loop iterates over a range and accumulates a result. Sigma notation is the mathematical equivalent - it specifies the start, end, and expression to accumulate.

Example

total = 0; for i in range(1,4): total += x[i] is the same as Sigma i=1 to 3 of x_i.

Why We Need a Shorthand for "Add Everything Up"

In machine learning, you're almost always doing something to every example in your dataset, or to every parameter in your model. If you have 10,000 training examples and you want to write "add up the error for each one," you cannot write 10,000 plus signs. You need a shorthand.

That shorthand is the (capital sigma) notation - the mathematical equivalent of a for-loop.

Reading the Sigma Symbol

Here's the full notation:

i=1nxi=x1+x2++xn\sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n
nxi\sum^{n} x_i
sum of x_i for i from 1 to n - add x_1 plus x_2 plus ... plus x_n

The pieces:

  • Σ - "add up what follows"
  • i=1 below the sigma - "start the counter at 1"
  • n above the sigma - "stop when i reaches n"
  • xᵢ to the right - "the thing you're adding each time"

It's literally a recipe for a loop: start at the bottom, go to the top, add the expression on the right at every step.

Interactive example

Sigma notation unpacker - enter a summation and watch it expand step by step

Coming soon

Worked Examples

Example 1: i=13xi\sum_{i=1}^{3} x_i when x1=2,x2=5,x3=3x_1=2, x_2=5, x_3=3

Example 2: i=14i\sum_{i=1}^{4} i (the thing being added is just i itself)

=1+2+3+4=10= 1 + 2 + 3 + 4 = \mathbf{10}

Example 3: i=15i2\sum_{i=1}^{5} i^2 (square each value of i first, then add)

=12+22+32+42+52=1+4+9+16+25=55= 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 1 + 4 + 9 + 16 + 25 = \mathbf{55}

The expression to the right of Σ can be as complex as you like - you evaluate it at each value of i and sum the results.

The Mean as a Summation

Here's a formula you already know, dressed up in sigma notation:

xˉ=1ni=1nxi\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
xˉ\bar{x}
x-bar - the mean (average) of the x values
nn
the number of values

That's just: "add up all the xix_i values, then divide by n." Which is exactly how you compute an average. In ML, your loss function is usually an average over your training examples. Sigma notation lets you write that crisply.

Nested Sums and Shorthand

Sometimes you'll drop the limits when they're obvious from context. ixi\sum_i x_i (without the i=1 and n) means "sum over all values of i."

You'll also see like ijxij\sum_i \sum_j x_{ij} - "for each i, for each j, add xijx_{ij}." Think of a grid: you're summing every cell in a matrix.

Product Notation: Sigma's Multiplication Twin

The capital (pi) does for multiplication what Σ does for addition:

i=1nai=a1×a2××an\prod_{i=1}^{n} a_i = a_1 \times a_2 \times \cdots \times a_n
nai\prod^{n} a_i
product of a_i for i from 1 to n - multiply a_1 times a_2 times ... times a_n
i=13ai=2×3×4=24\prod_{i=1}^{3} a_i = 2 \times 3 \times 4 = 24

Π shows up in probability, where you often multiply likelihoods together. If five independent events have probabilities p1=0.9,p2=0.8,p3=0.95,p4=0.7,p5=0.85p_1=0.9, p_2=0.8, p_3=0.95, p_4=0.7, p_5=0.85, the probability they all happen is:

i=15pi=0.9×0.8×0.95×0.7×0.850.406\prod_{i=1}^{5} p_i = 0.9 \times 0.8 \times 0.95 \times 0.7 \times 0.85 \approx 0.406

Where Sums Appear in Machine Learning

Sigma notation is everywhere once you start reading ML papers. Here are the four places you'll see it most:

1. Loss functions. The :

L=1ni=1n(yiy^i)2L = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
nn
number of training examples
yiy_i
true label for example i
y^i\hat{y}_i
predicted label for example i

2. The dot product. Every neuron computes a weighted sum of its inputs:

z=i=1nwixi=w1x1+w2x2++wnxnz = \sum_{i=1}^{n} w_i x_i = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n
zz
the weighted sum (pre-activation)
wiw_i
weight for input i
xix_i
i-th input feature

This is the dot product wx\mathbf{w} \cdot \mathbf{x} - the most computationally expensive operation in deep learning, and it's just sigma notation.

3. Softmax denominator. The function turns raw scores into probabilities. The denominator:

softmax(zk)=ezkj=1Kezj\text{softmax}(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}
zjz_j
raw score for class j
KK
total number of classes

"Sum of eze^z for each class j" - ensures all output probabilities add up to 1.

4. Expected value. For a discrete random variable X with probability P(X=x)P(X=x):

E[X]=xxP(X=x)\mathbb{E}[X] = \sum_x x \cdot P(X = x)
E[X]\mathbb{E}[X]
the expected value of X - a weighted average of possible values, weighted by their probabilities

"For each possible value x, multiply x by its probability, then sum." Used in reinforcement learning, Bayesian inference, and every probabilistic ML method.

The Mental Model: Sigma = For-Loop

Whenever you see Σ, mentally translate it as:

# Explicit for-loop (matches the math directly)
total = 0
for i in range(1, n + 1):   # i goes from 1 to n
    total += expression_at_i

# NumPy vectorized version (what you'll actually write)
import numpy as np

x = np.array([2, 5, 3])       # x₁=2, x₂=5, x₃=3
total = np.sum(x)             # → 10  (Σxᵢ)
mean  = np.mean(x)            # → 10/3 ≈ 3.33  (x̄ = (1/n)Σxᵢ)

# Weighted sum (dot product) — the fundamental neuron operation
w = np.array([0.4, -0.2, 0.9])
z = np.dot(w, x)              # → Σwᵢxᵢ = 0.4·2 + (-0.2)·5 + 0.9·3 = 2.3

That's it. Same for Π — it's a multiply-loop instead of an add-loop.

Don't let the symbols intimidate you. Behind every Σ is just a list of numbers being added up.

Quiz

1 / 3

What does Σᵢ₌₁³ xᵢ equal if x₁=1, x₂=3, x₃=5?