Tensors: N-dimensional arrays — Vectors & Matrices

You've seen scalars (single numbers), vectors (lists of numbers), and matrices (grids of numbers). These are all special cases of one unifying concept: a tensor. Understanding tensors is essential because every piece of data in ML — images, text, audio, batches of examples — is stored and processed as a tensor.

The hierarchy

Each step up adds a dimension:

Name	Dimensions	Shape example	What it holds
Scalar	0D	`()`	A single number: `3.14`
Vector	1D	`(n,)`	A list: `[1, 2, 3]`
Matrix	2D	`(m, n)`	A grid: a spreadsheet
3D tensor	3D	`(d₁, d₂, d₃)`	A stack of matrices
4D tensor	4D	`(d₁, d₂, d₃, d₄)`	A batch of 3D tensors
N-D tensor	ND	`(d₁, …, dₙ)`	Anything

Rank: how many dimensions?

The rank of a tensor is the number of axes it has (also called order or ndim).

A scalar has rank 0.
A vector has rank 1.
A matrix has rank 2.
A batch of RGB images has rank 4.

Shape: the size of each dimension

The shape of a tensor lists the size along each axis. A matrix with 5 rows and 3 columns has shape (5, 3). A 3D tensor with shape (4, 5, 3) is like a stack of four (5, 3) matrices.

Real data shapes you'll see constantly

A single grayscale image: shape (H, W) — height × width.

A single color image: shape (H, W, C) or (C, H, W) depending on convention (PyTorch prefers channels-first).

A batch of 32 color images at 224×224: shape (32, 3, 224, 224).

A sequence of 128 tokens, each embedded as a 512-dim vector: shape (128, 512).

A batch of 64 sequences: shape (64, 128, 512).

Interactive example

Tensor shape explorer

Coming soon

Why batching uses a tensor

When you train a neural network, you rarely process one example at a time — it's slow and the gradients are noisy. Instead, you process a batch of examples simultaneously.

Batching just means stacking individual examples along a new first axis:

\text{single image: } (H, W, C) \xrightarrow{\text{batch of } N} (N, H, W, C)

This is efficient because modern hardware (GPUs) can apply the same operation to all N examples in parallel. The math stays the same — operations just apply to each slice along the batch axis.

Indexing into tensors

Just like a matrix entry needs two indices (row, column), a tensor entry needs one index per axis.

For a tensor T of shape (4, 5, 3):

T[i] selects one slice of shape (5, 3).
T[i, j] selects one row of shape (3,).
T[i, j, k] selects a single number.

In NumPy/PyTorch, this looks like: T[2, 1, 0] → the number at position (2, 1, 0).

Reshaping: reorganizing without changing values

You can reorganize a tensor into a different shape as long as the total number of elements stays the same. This is called reshaping (or view in PyTorch).

\text{shape } (28, 28) \xrightarrow{\text{flatten}} \text{shape } (784,)

A 28×28 image has 784 pixels. Flattening turns the 2D grid into a 1D vector — same 784 numbers, different arrangement. This is how fully-connected layers accept image input.

Broadcasting: operating on different shapes

When you add a scalar to a vector, Python/NumPy "broadcasts" the scalar to match the vector's shape. The same idea extends to tensors:

(64, 128, 512) + (512,) \rightarrow (512,) \text{ is broadcast to } (64, 128, 512)

This means you can add a bias vector of shape (512,) to a batch of activations of shape (64, 128, 512) — the bias is added to every position automatically. No explicit looping required.

Tensors in code

In PyTorch, everything is a Tensor:

import torch

x = torch.tensor([1.0, 2.0, 3.0])  # shape (3,) — a vector
W = torch.randn(4, 3)               # shape (4, 3) — a matrix
y = W @ x                           # shape (4,)  — matrix-vector multiply

batch = torch.randn(32, 3)          # 32 vectors, shape (32, 3)
out = batch @ W.T                   # (32, 3) @ (3, 4) → (32, 4)

Every operation — addition, multiplication, matrix products — just works element-wise or along specified axes, automatically handling the batch dimension.

Summary

A tensor is an N-dimensional array. Scalars, vectors, and matrices are all special cases.
Rank = number of dimensions. Shape = size along each axis.
Batching adds a first axis of size N, letting GPUs process N examples at once.
Reshaping reorganizes elements without copying data.
In ML, "tensor" always means N-D array — not the physics definition.

In practice, most of your code will work on rank-2 and rank-3 tensors (matrices and batched sequences), with rank-4 for images. Understanding shapes is half the job of debugging neural network code.