Skip to content
Math Foundation Functions & Notation
Lesson 2 ⏱ 10 min

Mathematical notation

Video coming soon

Reading Mathematical Notation - Decoding the Symbols

Translates subscripts, superscripts, Greek letters, hat notation, and the semicolon convention into plain English, with ML examples for every symbol introduced.

⏱ ~6 min

🧮

Quick refresher

Lists and indexing

A list is an ordered collection of items. Indexing picks out one item by its position. In most math, indexing starts at 1 (so item 1 is first); in code it often starts at 0.

Example

List [5, 3, 8, 2].

Item at index 2 is 3 (1-based).

Item at index 2 is 8 (0-based).

Context tells you which convention applies.

Why Notation Feels Hard (And Why It Isn't)

Math notation is a compression format. It's like texting abbreviations - "lol," "brb," "imo" look like gibberish until you learn the code, and then they're perfectly clear. The problem isn't that notation is hard. The problem is that most textbooks throw it at you without translating it first.

This lesson translates it. By the end, you'll be able to look at something like:

y^i=f(xi;θ)\hat{y}_i = f(x_i;\thinspace \theta)

...and read it like a sentence. Let's go symbol by symbol.

Subscripts: Picking Items From a List

A is the little number or letter written below and to the right of a variable. It means "the i-th one" - an index into a collection.

Think of apartment numbers. If your building has units 1, 2, 3, ..., n, then unit i is the i-th apartment. Now substitute "weight" for "apartment":

  • w₁ = the first weight
  • w₂ = the second weight
  • wᵢ = the i-th weight

When you see wᵢ for i = 1, 2, ..., n, it means "there are n weights, and wᵢ is a shorthand for all of them."

In ML, you'll constantly see:

  • xᵢ = the i-th training example
  • yᵢ = the true label for the i-th example
  • ŷᵢ = the model's prediction for the i-th example
weights = [0.4, -0.2, 0.9, 0.1]  # a list of weights: w₁, w₂, w₃, w₄

# Math notation: w₃  (1-based, the 3rd weight)
# Python index:  weights[2]  (0-based, index 2 = third item)

w3_math   = weights[2]   # → 0.9   (math's w₃)
w_last    = weights[-1]  # → 0.1   (last weight, w₄)

Superscripts: Two Different Jobs

Superscripts (above and to the right) have two uses, and context always tells you which applies.

Job 1 - Exponentiation. x2x^2 means x squared. x3x^3 means x cubed.

Job 2 - Layer labels in neural networks. In a multi-layer network, x(2)x^{(2)} might mean "the activations in layer 2." The parentheses around the superscript usually signal this usage. So x2=xxx^2 = x \cdot x, but x(2)x^{(2)} = "x from layer 2." Different things.

When you see a superscript: if it's inside parentheses, it's probably a layer index. If it's a bare number, it's probably an exponent.

Greek Letters - Your New Vocabulary

ML papers love Greek letters. Here's the cheat sheet you actually need:

Parameters and knobs

The = all model parameters. Instead of listing every weight and bias, you say "θ represents everything the model learned."

The = the learning rate. In the gradient descent update wwαLw \leftarrow w - \alpha \nabla L, α controls how big a step you take.

The = a tiny number to avoid division by zero. You'll see it in the Adam optimizer as a constant like 1e-8.

Functions and aggregation

The = the sigmoid function σ(x)=1/(1+ex)\sigma(x) = 1/(1+e^{-x}). Outputs values between 0 and 1.

The = summation. "Add all of these up." This is so common it gets its own lesson - coming right up.

The = the mean (average) of a distribution.

Penalty strength

The = regularization strength. Controls how harsh the penalty for large weights is.

Reading Equations Left to Right

Equations are sentences. Read them that way. Take the gradient descent update:

wwαLw \leftarrow w - \alpha \nabla L
ww
the weight parameter being updated
α\alpha
learning rate - step size
L\nabla L
gradient of the loss with respect to w - direction of steepest increase

Left to right: "the new value of w becomes the old value of w, minus α times the gradient of L." The arrow ← means "assign" or "update." This is one of the most important equations in ML - and it's just a sentence about subtracting a scaled gradient.

Hat Notation: "This Is an Estimate"

When you see a hat (^) over a variable, it means "this is an estimate of the thing without the hat."

The Semicolon: Separating Input From Parameters

You'll often see notation like f(x;θ)f(x;\thinspace \theta). The semicolon separates two kinds of things:

  • Left of the semicolon: what varies during inference - xx, the input
  • Right of the semicolon: what's fixed once the model is trained - θ\theta, the parameters

It says: "This function takes input x and its behavior is shaped by θ. When you're using the trained model, θ is baked in. When you're training, you're adjusting θ."

Putting It Together

Let's read one complete expression:

y^i=f(xi;θ)\hat{y}_i = f(x_i;\thinspace \theta)
y^i\hat{y}_i
the model's predicted output for example i
xix_i
the i-th training input
θ\theta
all model parameters

Translation: "For the i-th training example, the model's prediction is obtained by passing the i-th input through the function f, whose behavior is controlled by parameters θ."

That's it. Once you know the code, equations stop being walls of symbols and start being sentences. Keep this page bookmarked - when you hit notation that confuses you, come back here first.

Quiz

1 / 3

What does wᵢ represent when i = 3?