A Matrix Is a Grid of Numbers
Think of a spreadsheet. Rows and columns of numbers. That is a .
Every neural network layer is a matrix multiplication: the layer takes in a vector and multiplies it by a weight matrix to produce the next layer's input. Every dataset is a matrix where rows are examples and columns are features. If vectors are ML's individual data points, matrices are how you process entire batches at once.
A matrix with rows and columns is called an m×n matrix (read "m by n"). Example of a 2×3 matrix:
- entry in row i, column j - row index first, column index second
Entry is in row , column . So (row 1, column 2) and (row 2, column 3). Row first, column second - always.
Matrices in ML
Your training dataset is a matrix. With 100 examples each having 10 features, you stack them into a 100×10 matrix where each row is one example:
- feature j of example i
- number of examples
- number of features
Row is the feature vector for example . Column contains feature across all examples. This is the universal data format in ML.
Matrix-Vector Multiplication
Multiplying an m×n matrix by an n-dimensional column vector produces an m-dimensional column vector. The rule: each row of the matrix dots with the vector to produce one output number.
- weight matrix - m rows, n columns
- input vector - n elements
- output vector - m elements
Concrete example:
- 2x2 matrix
- 2-element vector
Shape rule: matrix times vector vector. The must match - the number of columns in the matrix must equal the number of rows in the vector.
Matrix-Matrix Multiplication
You can multiply two matrices together when the shapes are compatible. For (m×k) times (k×n): the result is (m×n).
The entry-by-entry: .
- entry in row i, column j of the result
Check: ✓
Memory trick: . The two 's cancel. What remains are the outer dimensions.
Each output cell C[i][j] is the dot product of row i from A with column j from B — which is why the inner dimensions must match. In a neural network, this is how all inputs combine with all weights in one operation.
Why Matrix Multiplication Is Everywhere in ML
A fully connected neural network layer is matrix multiplication:
- weight matrix - shape (neurons_out x neurons_in)
- input vector
- bias vector
Every unit in the output receives a weighted sum of all inputs - which is exactly what this matrix multiplication computes. The bias is a vector added afterward.
Processing a full batch:
If you have a batch of 32 inputs, each 128-dimensional, stack them as a matrix of shape 32×128. With weight matrix of shape 64×128, compute to get 32×64 - one 64-dim output per example. All 32 predictions compute in parallel on the GPU.
The full forward pass of a 3-layer network:
- weight matrix of layer 1
- activations after layer 1
- final prediction
- weight matrix of layer 2
- weight matrix of output layer
Each layer is one matrix multiplication plus a bias addition. The entire forward pass is a chain of matrix ops - which is why PyTorch and TensorFlow are fundamentally matrix computation libraries with automatic differentiation built on top.
import numpy as np
# Matrix creation
A = np.array([[1, 2], [3, 4], [5, 6]]) # 3×2 matrix
print(A.shape) # → (3, 2)
# Matrix-vector multiplication
W = np.array([[0.1, -0.2, 0.3],
[0.4, 0.5, 0.6]]) # 2×3 weight matrix
x = np.array([1.0, 2.0, 3.0]) # 3-dim input
z = W @ x # → 2-dim output (W·x)
print(z)
# Batch: process 4 examples at once (shape 4×3)
X_batch = np.random.randn(4, 3)
Z_batch = X_batch @ W.T # → shape (4, 2)
print(Z_batch.shape)
Each output cell C[i][j] is the dot product of row i from A with column j from B — which is why the inner dimensions must match. In a neural network, this is how all inputs combine with all weights in one operation.