The Loss Landscape: Your Training Target Visualized
3D animation of loss surfaces for linear regression (bowl) vs. neural networks (chaotic mountains).
⏱ ~5 min
🧮
Quick refresher
Functions of multiple variables
A function f(w₁, w₂) takes two inputs and returns one output. We can plot it as a 3D surface — like a terrain map where height = output value.
Example
f(w₁, w₂) = w₁² + w₂² is a bowl shape.
f(0,0)=0 (bottom), f(1,0)=1 (up the side).
Training Is Navigation
To train a model, you need to answer one question over and over: which direction should I adjust the parameters to make predictions better?
The loss landscape is the map that answers this.
For a model with , we can draw a 3D surface:
x-axis: all possible values of w1
y-axis: all possible values of w2
Height at any point (w1,w2): the loss L(w1,w2) for those parameters
Training means until you find a valley.
Real models have millions of parameters — you can't visualize that space. But the intuition transfers: there's a high-dimensional surface over parameter space, and gradient descent is your downhill-hiking algorithm.
InteractiveGradient Descent on a Non-Convex Function
x =2.2000
f(x) =2.7902
f'(x) =8.7368
steps =0
This function has two local minima — one near x ≈ -1.3 (deeper) and one near x ≈ 1.3. Where gradient descent ends up depends on the starting point and learning rate.
Contour Maps
Instead of a 3D plot, you'll often see 2D contour plots — overhead views where each line connects points of equal loss. Like topographic maps.
The closer the contour lines are packed, the steeper the terrain. The at any point is perpendicular to the contour lines, pointing toward higher values.
The .
Convex Surfaces: The Easy Case
The loss surface for linear regression is a .
L=n1i=1∑n(yi−y^i)2
L
mean squared error loss
n
number of training examples
yi
true label for example i
y^i
model prediction for example i
A convex function has one critical property: any local minimum is also the global minimum. If you find a flat point ( ), you're at the best possible solution everywhere.
This is mathematically clean. But it only holds for simple linear models.
Non-Convex Surfaces: The Real World
Neural network loss surfaces are not convex. They're high-dimensional terrain with:
Local minima: valleys that aren't the global minimum. The gradient is zero (looks like a bottom), but there are better valleys elsewhere. .
Saddle points: flat points where some directions go downhill and others go uphill. The gradient is zero but you're not at a minimum. .
Flat plateaus: vast regions where the gradient is tiny, causing very slow progress. Can feel like training has stalled.
Quiz
1 / 3
In a loss landscape visualization, what does 'height' represent?