Skip to content
Classification
Lesson 5 ⏱ 10 min

The decision boundary

Video coming soon

Decision Boundaries - The Geometry of Classification

Visual derivation of the linear decision boundary from w·x + b = 0, the role of w as a normal vector, how b shifts the boundary, and why XOR requires a nonlinear boundary.

⏱ ~6 min

🧮

Quick refresher

The hyperplane equation

A hyperplane in p dimensions is the set of all points x satisfying w·x + b = 0. In 2D this is a line; in 3D it is a plane. The vector w is perpendicular (normal) to the hyperplane.

Example

w = [1, -1], b = 0: boundary is x₁ - x₂ = 0, i.e.

the line x₁ = x₂.

Points where x₁ > x₂ are on the positive side (class 1); x₁ < x₂ are on the negative side (class 0).

What Is a Decision Boundary?

The decision boundary is the surface in feature space where the classifier is exactly on the fence — it assigns equal probability to both classes. Cross that surface and the predicted class flips.

A classifier's decision boundary is what it has actually learned. Visualizing it immediately reveals whether a model is underfitting, overfitting, or learning the right pattern — making it one of the most diagnostic tools in machine learning.

For logistic regression, the boundary is where y^=0.5\hat{y} = 0.5. Since y^=σ(z)\hat{y} = \sigma(z) and σ(z)=0.5\sigma(z) = 0.5 exactly when z=0z = 0, the is:

wx+b=0\mathbf{w} \cdot \mathbf{x} + b = 0
w\mathbf{w}
weight vector - perpendicular (normal) to the decision boundary
x\mathbf{x}
input feature vector
bb
bias - shifts the decision boundary parallel to itself

Notice: the boundary is defined by the linear part, not the sigmoid. The sigmoid just converts scores to probabilities. The geometry is entirely determined by w\mathbf{w} and bb.

The Geometry

In pp dimensions, the equation wx+b=0\mathbf{w} \cdot \mathbf{x} + b = 0 defines a (p1)(p-1)-dimensional :

  • 2D (two features): a linew1x1+w2x2+b=0w_1 x_1 + w_2 x_2 + b = 0
  • 3D: a planew1x1+w2x2+w3x3+b=0w_1 x_1 + w_2 x_2 + w_3 x_3 + b = 0
  • p dimensions: a (p-1)-dimensional hyperplane

The weight vector w\mathbf{w} is perpendicular (normal) to the hyperplane. Moving along w\mathbf{w} increases wx\mathbf{w} \cdot \mathbf{x}; moving perpendicular to w\mathbf{w} keeps wx\mathbf{w} \cdot \mathbf{x} constant (you stay on the boundary).

The bias bb shifts the boundary. If b=0b = 0, the boundary passes through the origin. Changing bb moves the boundary parallel to itself — closer to or farther from the origin. Without bb, the model would be forced to have the boundary pass through x=0\mathbf{x} = \mathbf{0}, which is rarely where it should be.

Which Side Is Which?

{zgt;0    σ(z)gt;0.5    predict class 1 zlt;0    σ(z)lt;0.5    predict class 0\begin{cases} z &gt; 0 \implies \sigma(z) &gt; 0.5 \implies \text{predict class 1} \ z &lt; 0 \implies \sigma(z) &lt; 0.5 \implies \text{predict class 0} \end{cases}
zz
linear score at point x - positive means class 1 side, negative means class 0 side

Concrete example: w1=1, w2=1, b=0w_1 = 1,\ w_2 = -1,\ b = 0. The boundary is x1x2=0x_1 - x_2 = 0, i.e., the diagonal line x1=x2x_1 = x_2. Points above the diagonal (x_1 &gt; x_2, so z &gt; 0) are class 1. Points below (x_1 &lt; x_2) are class 0.

Linearly Separable Data

Sometimes a single hyperplane can perfectly separate all class-0 examples from all class-1 examples. The data is then called .

In real datasets, perfect separation is rare. Classes overlap, examples are mislabeled, and real boundaries are fuzzy. Logistic regression handles this gracefully: it finds the hyperplane that minimizes cross-entropy loss, placing the boundary where it makes the fewest — or least costly — errors.

When Linear Boundaries Aren't Enough

The XOR problem demonstrates a fundamental limit:

  • Point (0, 0) → class 0
  • Point (1, 1) → class 0
  • Point (0, 1) → class 1
  • Point (1, 0) → class 1

Other patterns logistic regression cannot capture: concentric circles (one class inside a ring, the other outside), spirals, any crescent or non-convex boundary.

import numpy as np
from sklearn.linear_model import LogisticRegression

# XOR dataset: four points, two classes that no straight line can separate
X_xor = np.array([[0, 0],   # class 0
                  [1, 1],   # class 0
                  [0, 1],   # class 1
                  [1, 0]])  # class 1
y_xor = np.array([0, 0, 1, 1])

# Try logistic regression (linear decision boundary only)
clf = LogisticRegression()
clf.fit(X_xor, y_xor)
preds = clf.predict(X_xor)
accuracy = (preds == y_xor).mean()
print(f"Logistic regression accuracy on XOR: {accuracy:.0%}")
# 50% or 75% — never 100%, because no straight line can separate all four points

# The learned boundary: w·x + b = 0
w = clf.coef_[0]
b = clf.intercept_[0]
print(f"Boundary: {w[0]:.2f}·x₁ + {w[1]:.2f}·x₂ + {b:.2f} = 0")
print()

# Score and prediction for each point
for xi, yi, pi in zip(X_xor, y_xor, preds):
    score = np.dot(w, xi) + b
    correct = "✓" if pi == yi else "✗"
    print(f"  x={xi}, true={yi}, score={score:+.2f}, predicted={pi} {correct}")
# At least one point is always wrong — XOR is fundamentally non-linear

The Solution: Neural Networks

Neural networks solve this by learning feature transformations. Hidden layers compute new representations of the input, making the problem linearly separable in a transformed space even if it wasn't in the original space.

For XOR: if a hidden layer learns to compute a new feature capturing the XOR structure, then a single logistic output on that feature can perfectly classify the problem. The network learns both the transformation and the final linear boundary — jointly, end-to-end through gradient descent.

More layers → more complex transformations → more complex boundaries. This is why deep networks can model patterns in images and text that linear models have no hope of capturing.

Interactive example

Adjust the weight vector and bias to move the decision boundary - see which points flip class

Coming soon

Quiz

1 / 3

For logistic regression with two features, the decision boundary is...