Skip to content
Math Foundation Vectors & Matrices
Lesson 5 ⏱ 10 min

Outer products and rank-1 matrices

Video coming soon

Outer Products: Building Matrices from Vectors

How multiplying a column vector by a row vector produces a full matrix — and why this pattern appears constantly in gradient descent and attention.

⏱ ~6 min

🧮

Quick refresher

Matrix multiplication

An m×1 column vector times a 1×n row vector gives an m×n matrix. Each entry is a product of one element from each vector.

Example

[1, 2, 3]ᵀ ⊗ [4, 5] = [[4,5],[8,10],[12,15]].

You already know how to multiply a matrix by a vector: it transforms a vector into a new vector. But there's a different kind of multiplication that builds a matrix from scratch using just two vectors. It's called the outer product, and it shows up constantly in ML.

Building a matrix from two vectors

Say you have two vectors:

a=[1 2 3],b=[4 5]\mathbf{a} = \begin{bmatrix} 1 \ 2 \ 3 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 4 \ 5 \end{bmatrix}

The outer product ab\mathbf{a} \otimes \mathbf{b} (also written ab\mathbf{a}\mathbf{b}^\top) multiplies every entry of a with every entry of b:

ab=[14amp;15 24amp;25 34amp;35]=[4amp;5 8amp;10 12amp;15]\mathbf{a}\mathbf{b}^\top = \begin{bmatrix} 1 \cdot 4 & 1 \cdot 5 \ 2 \cdot 4 & 2 \cdot 5 \ 3 \cdot 4 & 3 \cdot 5 \end{bmatrix} = \begin{bmatrix} 4 & 5 \ 8 & 10 \ 12 & 15 \end{bmatrix}

The result is a 3×2 matrix — the dimensions of a by the dimensions of b.

The rule

For a column vector a\mathbf{a} of length m and a row vector b\mathbf{b}^\top of length n:

(ab)ij=aibj(\mathbf{a}\mathbf{b}^\top)_{ij} = a_i \cdot b_j

Entry at row i, column j is just the product of the i-th element of a and the j-th element of b. No summing — just multiplying pairs.

What does a rank-1 matrix look like?

The matrix you get from an outer product has a special structure: every row is a scaled copy of the same vector (b\mathbf{b}^\top), scaled by the corresponding entry of a.

[4amp;5 8amp;10 12amp;15]=[1[4,5] 2[4,5] 3[4,5]]\begin{bmatrix} 4 & 5 \ 8 & 10 \ 12 & 15 \end{bmatrix} = \begin{bmatrix} 1 \cdot [4,5] \ 2 \cdot [4,5] \ 3 \cdot [4,5] \end{bmatrix}

All three rows point in the same direction — just different lengths. This is called a rank-1 matrix: it only has one independent direction.

Why this matters for ML

Outer products appear in three key places:

1. Weight gradients in a single layer

When a neural network layer computes y=Wx\mathbf{y} = W\mathbf{x}, the gradient of the loss with respect to the weight matrix is:

LW=δ,x\frac{\partial L}{\partial W} = \boldsymbol{\delta} , \mathbf{x}^\top

That's an outer product. is the error signal flowing back, and x\mathbf{x} is the layer's input. Their outer product tells us how much each weight contributed.

2. Attention patterns

In transformers, the attention matrix is built by computing dot products between all pairs of query and key vectors. For one head:

A=softmax!(QKd)A = \text{softmax}!\left(\frac{QK^\top}{\sqrt{d}}\right)

The QKQK^\top part is essentially a sum of outer products — each query dotted with every key.

3. Low-rank approximations

Many large matrices in ML (embeddings, weight matrices) are approximated as a sum of a few rank-1 matrices. This is the foundation of techniques like LoRA (Low-Rank Adaptation), which adapts huge models cheaply by adding small outer-product corrections.

Interactive example

Outer product visualizer

Coming soon

Contrast: outer vs dot

It's worth pausing to make sure the two operations are distinct:

OperationTakesReturnsExample
Dot product ab\mathbf{a} \cdot \mathbf{b}Two vectors (same length)A single numberMeasures alignment
Outer product ab\mathbf{a}\mathbf{b}^\topTwo vectors (any length)A matrix (m×n)Combines every pair

Think of the dot product as collapsing two vectors into one number, and the outer product as expanding two vectors into a grid of combinations.

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5])

# Outer product: shape (3, 2)
outer = np.outer(a, b)
print(outer)
# [[4,  5],
#  [8, 10],
#  [12, 15]]
# entry (i,j) = a[i] * b[j]

# In gradient computation: δ ⊗ xᵀ gives weight gradient
delta = np.array([0.5, -0.3])   # output error signal (2-dim)
x_in  = np.array([1.0, 2.0, 3.0])  # input features (3-dim)
dW    = np.outer(delta, x_in)   # weight gradient shape (2, 3)
print(dW)

# PyTorch equivalent
import torch
a_t = torch.tensor([1.0, 2.0, 3.0])
b_t = torch.tensor([4.0, 5.0])
print(torch.outer(a_t, b_t))

A useful identity

Any matrix MM can be written as a sum of rank-1 matrices:

M=kσk,ukvkM = \sum_k \sigma_k , \mathbf{u}_k \mathbf{v}_k^\top

This is the Singular Value Decomposition (SVD), which you'll meet later when studying PCA and embeddings. Each term is an outer product, weighted by .

Summary

  • The outer product of an m-vector and an n-vector gives an m×n matrix.
  • Entry (i, j) = aᵢ × bⱼ — multiply pairs, no summing.
  • The result is rank-1: all rows are multiples of the same vector.
  • Outer products describe weight gradients (δ·xᵀ) and underlie attention and low-rank methods.

Quiz

1 / 3

What is the outer product of [1, 2]ᵀ and [3, 4]?