Everything Starts as Numbers
ML algorithms understand exactly one language: numbers. Not images. Not text. Not audio. Numbers. The first step in any ML project is converting whatever data you have into vectors of numbers.
This sounds limiting. It is actually a superpower. Once data is a vector, every mathematical tool we have built - derivatives, matrix operations, optimization - becomes available.
The is the translation between the real world and the mathematical universe your model lives in.
Images as Numbers
A grayscale image is a 2D grid of numbers. Each pixel holds a value from (black) to (white). A pixel image is a matrix. Flatten it row by row: numbers in a 1D vector.
- image height in pixels
- image width in pixels
- color channels: 1 for grayscale, 3 for RGB
- total feature count after flattening
A color (RGB) image has three channels. A color image becomes numbers when flattened. Neural networks can work with either the 3D tensor (CNNs, which preserve spatial layout) or the flat vector (dense networks, which treat all pixels independently).
The is why CNNs outperform flat networks for image data.
Text as Numbers
Text does not come pre-numbered. You choose a representation.
Bag of words: count how many times each vocabulary word appears. A sentence becomes a vector the length of the vocabulary, mostly zeros. Simple and fast, but loses word order entirely.
One-hot encoding: each word maps to a vector of all zeros except a single 1 at the word's index. "cat" might be index 247 in a 10,000-word vocabulary. Problem: every word is equally distant from every other word, which is clearly wrong.
The encode semantic similarity as geometric closeness in vector space.
Each word maps to a dense vector like with 100-1000 dimensions. This is what word2vec, GloVe, and transformer embeddings produce. We cover embeddings in detail later in the course.
Tabular Data
If your data is already in a spreadsheet - customer records, sensor readings, financial data - you are most of the way there. Two things to address:
Numbers: use as-is, but to put all features on comparable scales.
Categories: never assign arbitrary numbers to unordered categories. NYC=1, LA=2, Chicago=3 implies LA is the midpoint of NYC and Chicago - which is meaningless. Use one-hot encoding: a binary column per category, with a 1 indicating which value applies.
The Feature Vector
Every training example becomes a fixed-length vector of numbers - the feature vector :
- value of feature j for this example
- number of features (the vector length)
Example: a loan applicant: (age, income, years employed, credit score, accounts).
Every example must have the same length . You cannot feed a 5-feature vector to a model trained on 10.
The Dataset Matrix
Think of it as a spreadsheet: each row is one example (a photo, a customer, a sensor reading) and each column is one feature (pixel value, age, temperature). That spreadsheet is — and every ML algorithm reads data in exactly this form.
Stack feature vectors as rows and you get the data matrix with shape :
- data matrix, shape n times p
- number of examples (rows)
- number of features (columns)
- label vector, one true output per example
The label vector holds one ground-truth output per example.
This matrix is the universal input format: linear regression, neural networks, SVMs, decision trees - all expect rows as examples and columns as features.
Interactive example
Convert raw data of different types into the X matrix - see how each format becomes numbers
Coming soon