Sums and products (Σ and Π) — Functions & Notation

Why We Need a Shorthand for "Add Everything Up"

In machine learning, you're almost always doing something to every example in your dataset, or to every parameter in your model. If you have 10,000 training examples and you want to write "add up the error for each one," you cannot write 10,000 plus signs. You need a shorthand.

That shorthand is the (capital sigma) notation - the mathematical equivalent of a for-loop.

Reading the Sigma Symbol

Here's the full notation:

\sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n

$\sum^{n} x_i$: sum of x_i for i from 1 to n - add x_1 plus x_2 plus ... plus x_n

The pieces:

Σ - "add up what follows"
i=1 below the sigma - "start the counter at 1"
n above the sigma - "stop when i reaches n"
xᵢ to the right - "the thing you're adding each time"

It's literally a recipe for a loop: start at the bottom, go to the top, add the expression on the right at every step.

Interactive example

Sigma notation unpacker - enter a summation and watch it expand step by step

Coming soon

Worked Examples

Example 1: $\sum_{i=1}^{3} x_i$ when $x_1=2, x_2=5, x_3=3$

Example 2: $\sum_{i=1}^{4} i$ (the thing being added is just i itself)

= 1 + 2 + 3 + 4 = \mathbf{10}

Example 3: $\sum_{i=1}^{5} i^2$ (square each value of i first, then add)

= 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 1 + 4 + 9 + 16 + 25 = \mathbf{55}

The expression to the right of Σ can be as complex as you like - you evaluate it at each value of i and sum the results.

The Mean as a Summation

Here's a formula you already know, dressed up in sigma notation:

\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i

$\bar{x}$: x-bar - the mean (average) of the x values
$n$: the number of values

That's just: "add up all the $x_i$ values, then divide by n." Which is exactly how you compute an average. In ML, your loss function is usually an average over your training examples. Sigma notation lets you write that crisply.

Nested Sums and Shorthand

Sometimes you'll drop the limits when they're obvious from context. $\sum_i x_i$ (without the i=1 and n) means "sum over all values of i."

You'll also see like $\sum_i \sum_j x_{ij}$ - "for each i, for each j, add $x_{ij}$ ." Think of a grid: you're summing every cell in a matrix.

Product Notation: Sigma's Multiplication Twin

The capital (pi) does for multiplication what Σ does for addition:

\prod_{i=1}^{n} a_i = a_1 \times a_2 \times \cdots \times a_n

$\prod^{n} a_i$: product of a_i for i from 1 to n - multiply a_1 times a_2 times ... times a_n

\prod_{i=1}^{3} a_i = 2 \times 3 \times 4 = 24

Π shows up in probability, where you often multiply likelihoods together. If five independent events have probabilities $p_1=0.9, p_2=0.8, p_3=0.95, p_4=0.7, p_5=0.85$ , the probability they all happen is:

\prod_{i=1}^{5} p_i = 0.9 \times 0.8 \times 0.95 \times 0.7 \times 0.85 \approx 0.406

Where Sums Appear in Machine Learning

Sigma notation is everywhere once you start reading ML papers. Here are the four places you'll see it most:

1. Loss functions. The :

L = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

$n$: number of training examples
$y_i$: true label for example i
$\hat{y}_i$: predicted label for example i

2. The dot product. Every neuron computes a weighted sum of its inputs:

z = \sum_{i=1}^{n} w_i x_i = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n

$z$: the weighted sum (pre-activation)
$w_i$: weight for input i
$x_i$: i-th input feature

This is the dot product $\mathbf{w} \cdot \mathbf{x}$ - the most computationally expensive operation in deep learning, and it's just sigma notation.

3. Softmax denominator. The function turns raw scores into probabilities. The denominator:

\text{softmax}(z_k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}

$z_j$: raw score for class j
$K$: total number of classes

"Sum of $e^z$ for each class j" - ensures all output probabilities add up to 1.

4. Expected value. For a discrete random variable X with probability $P(X=x)$ :

\mathbb{E}[X] = \sum_x x \cdot P(X = x)

$\mathbb{E}[X]$: the expected value of X - a weighted average of possible values, weighted by their probabilities

"For each possible value x, multiply x by its probability, then sum." Used in reinforcement learning, Bayesian inference, and every probabilistic ML method.

The Mental Model: Sigma = For-Loop

Whenever you see Σ, mentally translate it as:

# Explicit for-loop (matches the math directly)
total = 0
for i in range(1, n + 1):   # i goes from 1 to n
    total += expression_at_i

# NumPy vectorized version (what you'll actually write)
import numpy as np

x = np.array([2, 5, 3])       # x₁=2, x₂=5, x₃=3
total = np.sum(x)             # → 10  (Σxᵢ)
mean  = np.mean(x)            # → 10/3 ≈ 3.33  (x̄ = (1/n)Σxᵢ)

# Weighted sum (dot product) — the fundamental neuron operation
w = np.array([0.4, -0.2, 0.9])
z = np.dot(w, x)              # → Σwᵢxᵢ = 0.4·2 + (-0.2)·5 + 0.9·3 = 2.3

That's it. Same for Π — it's a multiply-loop instead of an add-loop.

Don't let the symbols intimidate you. Behind every Σ is just a list of numbers being added up.