Skip to content
Math Foundation Vectors & Matrices
Lesson 3 ⏱ 14 min

Matrices and multiplication

Video coming soon

Matrices: Grids That Transform Data

Matrix-vector and matrix-matrix multiplication from scratch. How a neural network layer is a matrix multiply, and why GPUs exist to do this fast.

⏱ ~8 min

🧮

Quick refresher

Dot product

The dot product of two vectors a and b is Σᵢ aᵢbᵢ - multiply corresponding elements and sum. It equals ||a||·||b||·cos(θ). Perpendicular vectors have dot product 0.

Example

[1, 2, 3]·[4, 5, 6] = 4+10+18 = 32.

A Matrix Is a Grid of Numbers

Think of a spreadsheet. Rows and columns of numbers. That is a .

Every neural network layer is a matrix multiplication: the layer takes in a vector and multiplies it by a weight matrix to produce the next layer's input. Every dataset is a matrix where rows are examples and columns are features. If vectors are ML's individual data points, matrices are how you process entire batches at once.

A matrix with rows and columns is called an m×n matrix (read "m by n"). Example of a 2×3 matrix:

A=[1amp;2amp;3 4amp;5amp;6]\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \ 4 & 5 & 6 \end{bmatrix}
AijA_{ij}
entry in row i, column j - row index first, column index second

Entry AijA_{ij} is in row ii, column jj. So A12=2A_{12} = 2 (row 1, column 2) and A23=6A_{23} = 6 (row 2, column 3). Row first, column second - always.

Matrices in ML

Your training dataset is a matrix. With 100 examples each having 10 features, you stack them into a 100×10 matrix where each row is one example:

X=[x11amp;x12amp;amp;x1p x21amp;x22amp;amp;x2p amp;amp;amp; xn1amp;xn2amp;amp;xnp]\mathbf{X} = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \ x_{21} & x_{22} & \cdots & x_{2p} \ \vdots & & \ddots & \vdots \ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}
xijx_{ij}
feature j of example i
nn
number of examples
pp
number of features

Row ii is the feature vector for example ii. Column jj contains feature jj across all examples. This is the universal data format in ML.

Matrix-Vector Multiplication

Multiplying an m×n matrix by an n-dimensional column vector produces an m-dimensional column vector. The rule: each row of the matrix dots with the vector to produce one output number.

[1amp;2 3amp;4][x y]=[1x+2y 3x+4y]\begin{bmatrix} 1 & 2 \ 3 & 4 \end{bmatrix} \begin{bmatrix} x \ y \end{bmatrix} = \begin{bmatrix} 1x + 2y \ 3x + 4y \end{bmatrix}
W\mathbf{W}
weight matrix - m rows, n columns
x\mathbf{x}
input vector - n elements
y\mathbf{y}
output vector - m elements

Concrete example:

[1amp;2 5amp;6][3 4]=[13+24 53+64]=[11 39]\begin{bmatrix} 1 & 2 \ 5 & 6 \end{bmatrix} \begin{bmatrix} 3 \ 4 \end{bmatrix} = \begin{bmatrix} 1 \cdot 3 + 2 \cdot 4 \ 5 \cdot 3 + 6 \cdot 4 \end{bmatrix} = \begin{bmatrix} 11 \ 39 \end{bmatrix}
A\mathbf{A}
2x2 matrix
v\mathbf{v}
2-element vector

Shape rule: (m×n)(m \times n) matrix times (n×1)(n \times 1) vector =(m×1)= (m \times 1) vector. The nn must match - the number of columns in the matrix must equal the number of rows in the vector.

Matrix-Matrix Multiplication

You can multiply two matrices together when the shapes are compatible. For A\mathbf{A} (m×k) times B\mathbf{B} (k×n): the result is C\mathbf{C} (m×n).

The entry-by-entry: Cij=row\thinspacei of Acolumn\thinspacej of BC_{ij} = \text{row}\thinspacei\text{ of }\mathbf{A} \cdot \text{column}\thinspacej\text{ of }\mathbf{B}.

[1amp;2 3amp;4][5amp;6 7amp;8]=[15+27amp;16+28 35+47amp;36+48]=[19amp;22 43amp;50]\begin{bmatrix} 1 & 2 \ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1{\cdot}5+2{\cdot}7 & 1{\cdot}6+2{\cdot}8 \ 3{\cdot}5+4{\cdot}7 & 3{\cdot}6+4{\cdot}8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \ 43 & 50 \end{bmatrix}
CijC_{ij}
entry in row i, column j of the result

Check: (2×2)×(2×2)(2×2)(2 \times 2) \times (2 \times 2) \to (2 \times 2)

Memory trick: (m×k)×(k×n)(m×n)(\underbrace{m \times k}) \times (\underbrace{k \times n}) \to (m \times n). The two kk's cancel. What remains are the outer dimensions.

InteractiveMatrix Multiplication — hover output cells to see the computation
A (2×2)
×
B (2×2)
=
C = AB (2×2)
19
22
43
50

Each output cell C[i][j] is the dot product of row i from A with column j from B — which is why the inner dimensions must match. In a neural network, this is how all inputs combine with all weights in one operation.

Why Matrix Multiplication Is Everywhere in ML

A fully connected neural network layer is matrix multiplication:

a=Wx+b\mathbf{a} = \mathbf{W}\mathbf{x} + \mathbf{b}
W\mathbf{W}
weight matrix - shape (neurons_out x neurons_in)
x\mathbf{x}
input vector
b\mathbf{b}
bias vector

Every unit in the output receives a weighted sum of all inputs - which is exactly what this matrix multiplication computes. The bias b\mathbf{b} is a vector added afterward.

Processing a full batch:

If you have a batch of 32 inputs, each 128-dimensional, stack them as a matrix B\mathbf{B} of shape 32×128. With weight matrix W\mathbf{W} of shape 64×128, compute BW\mathbf{B}\mathbf{W}^\top to get 32×64 - one 64-dim output per example. All 32 predictions compute in parallel on the GPU.

The full forward pass of a 3-layer network:

z1=W1x+b1,a1=ReLU(z1)\mathbf{z}_1 = \mathbf{W}_1 \mathbf{x} + \mathbf{b}_1, \quad \mathbf{a}_1 = \text{ReLU}(\mathbf{z}_1)
W1\mathbf{W}_1
weight matrix of layer 1
a1\mathbf{a}_1
activations after layer 1
y^\hat{\mathbf{y}}
final prediction
z2=W2a1+b2,y^=W3ReLU(z2)+b3\mathbf{z}_2 = \mathbf{W}_2 \mathbf{a}_1 + \mathbf{b}_2, \quad \hat{\mathbf{y}} = \mathbf{W}_3\thinspace\text{ReLU}(\mathbf{z}_2) + \mathbf{b}_3
W2\mathbf{W}_2
weight matrix of layer 2
W3\mathbf{W}_3
weight matrix of output layer

Each layer is one matrix multiplication plus a bias addition. The entire forward pass is a chain of matrix ops - which is why PyTorch and TensorFlow are fundamentally matrix computation libraries with automatic differentiation built on top.

import numpy as np

# Matrix creation
A = np.array([[1, 2], [3, 4], [5, 6]])   # 3×2 matrix
print(A.shape)   # → (3, 2)

# Matrix-vector multiplication
W = np.array([[0.1, -0.2, 0.3],
              [0.4,  0.5, 0.6]])   # 2×3 weight matrix
x = np.array([1.0, 2.0, 3.0])     # 3-dim input
z = W @ x                          # → 2-dim output (W·x)
print(z)

# Batch: process 4 examples at once (shape 4×3)
X_batch = np.random.randn(4, 3)
Z_batch = X_batch @ W.T            # → shape (4, 2)
print(Z_batch.shape)
InteractiveMatrix Multiplication — hover output cells to see the computation
A (2×2)
×
B (2×2)
=
C = AB (2×2)
19
22
43
50

Each output cell C[i][j] is the dot product of row i from A with column j from B — which is why the inner dimensions must match. In a neural network, this is how all inputs combine with all weights in one operation.

Quiz

1 / 3

A 2×3 matrix multiplied by a 3×1 vector produces a...