The Simplest Useful Model
Linear regression is the foundation of predictive modeling. It is not the most powerful tool - neural networks learn far more complex relationships - but it is the bedrock that everything else builds on. And it remains genuinely useful: housing price prediction, demand forecasting, risk scoring.
The core assumption: the relationship between input features x and output is approximately linear. Add more square footage to a house and the price rises proportionally. Many real relationships, especially over small ranges, are close enough to linear to make this practical.
The Single-Feature Model
Start with one input and one output. The linear model is:
- the prediction - hat means estimated
- weight - the slope, how much prediction changes per unit increase in x
- the single input feature
- bias - the y-intercept, shifts the entire line up or down
This is the equation of a line: from middle school, with renamed to . The symbols and are the model's parameters - the numbers we will learn from data.
Concrete example: predict house price from size. Suppose (each extra square foot adds $200, since price is in thousands) and (base price $50k). For a 1,500 sq ft house:
Why "Weight" and "Bias"?
The term weight captures how influential a feature is. A feature with weight is 100 times more influential than one with weight . The magnitude tells you relative importance; the sign tells you direction (positive means higher value → higher prediction, negative means higher value → lower prediction).
The is what allows a line to not pass through the origin.
Multiple Features
Real problems have many features. For features :
- weight for feature j - how much feature j contributes to the prediction
- value of feature j for this example
- total number of features
- bias term
The sum is exactly a dot product : multiply each pair of matching elements and sum. This notation is compact and computationally efficient.
Three-feature house example (size, bedrooms, distance):
The model is still just a weighted sum - it defines a hyperplane in the -dimensional feature space instead of a line in 2D.
The is the core of every linear model.
For the Full Dataset
To predict all training examples simultaneously, stack inputs as a matrix:
- data matrix, shape n times p
- weight vector, shape p times 1
- bias scalar, broadcast across all n examples
- prediction vector, shape n times 1
Here, is , is , and is - one prediction per example. This single matrix operation replaces a loop over all training examples.
Interactive example
Adjust weight and bias sliders - watch the regression line move over the scatter plot
Coming soon