The decision boundary — Classification

What Is a Decision Boundary?

The decision boundary is the surface in feature space where the classifier is exactly on the fence — it assigns equal probability to both classes. Cross that surface and the predicted class flips.

A classifier's decision boundary is what it has actually learned. Visualizing it immediately reveals whether a model is underfitting, overfitting, or learning the right pattern — making it one of the most diagnostic tools in machine learning.

For logistic regression, the boundary is where $\hat{y} = 0.5$ . Since $\hat{y} = \sigma(z)$ and $\sigma(z) = 0.5$ exactly when $z = 0$ , the is:

\mathbf{w} \cdot \mathbf{x} + b = 0

$\mathbf{w}$: weight vector - perpendicular (normal) to the decision boundary
$\mathbf{x}$: input feature vector
$b$: bias - shifts the decision boundary parallel to itself

Notice: the boundary is defined by the linear part, not the sigmoid. The sigmoid just converts scores to probabilities. The geometry is entirely determined by $\mathbf{w}$ and $b$ .

The Geometry

In $p$ dimensions, the equation $\mathbf{w} \cdot \mathbf{x} + b = 0$ defines a $(p-1)$ -dimensional :

2D (two features): a line — $w_1 x_1 + w_2 x_2 + b = 0$
3D: a plane — $w_1 x_1 + w_2 x_2 + w_3 x_3 + b = 0$
p dimensions: a (p-1)-dimensional hyperplane

The weight vector $\mathbf{w}$ is perpendicular (normal) to the hyperplane. Moving along $\mathbf{w}$ increases $\mathbf{w} \cdot \mathbf{x}$ ; moving perpendicular to $\mathbf{w}$ keeps $\mathbf{w} \cdot \mathbf{x}$ constant (you stay on the boundary).

The bias $b$ shifts the boundary. If $b = 0$ , the boundary passes through the origin. Changing $b$ moves the boundary parallel to itself — closer to or farther from the origin. Without $b$ , the model would be forced to have the boundary pass through $\mathbf{x} = \mathbf{0}$ , which is rarely where it should be.

Which Side Is Which?

\begin{cases} z &gt; 0 \implies \sigma(z) &gt; 0.5 \implies \text{predict class 1} \ z &lt; 0 \implies \sigma(z) &lt; 0.5 \implies \text{predict class 0} \end{cases}

$z$: linear score at point x - positive means class 1 side, negative means class 0 side

Concrete example: $w_1 = 1,\ w_2 = -1,\ b = 0$ . The boundary is $x_1 - x_2 = 0$ , i.e., the diagonal line $x_1 = x_2$ . Points above the diagonal ( $x_1 > x_2$ , so $z > 0$ ) are class 1. Points below ( $x_1 < x_2$ ) are class 0.

Linearly Separable Data

Sometimes a single hyperplane can perfectly separate all class-0 examples from all class-1 examples. The data is then called .

In real datasets, perfect separation is rare. Classes overlap, examples are mislabeled, and real boundaries are fuzzy. Logistic regression handles this gracefully: it finds the hyperplane that minimizes cross-entropy loss, placing the boundary where it makes the fewest — or least costly — errors.

When Linear Boundaries Aren't Enough

The XOR problem demonstrates a fundamental limit:

Point (0, 0) → class 0
Point (1, 1) → class 0
Point (0, 1) → class 1
Point (1, 0) → class 1

Other patterns logistic regression cannot capture: concentric circles (one class inside a ring, the other outside), spirals, any crescent or non-convex boundary.

import numpy as np
from sklearn.linear_model import LogisticRegression

# XOR dataset: four points, two classes that no straight line can separate
X_xor = np.array([[0, 0],   # class 0
                  [1, 1],   # class 0
                  [0, 1],   # class 1
                  [1, 0]])  # class 1
y_xor = np.array([0, 0, 1, 1])

# Try logistic regression (linear decision boundary only)
clf = LogisticRegression()
clf.fit(X_xor, y_xor)
preds = clf.predict(X_xor)
accuracy = (preds == y_xor).mean()
print(f"Logistic regression accuracy on XOR: {accuracy:.0%}")
# 50% or 75% — never 100%, because no straight line can separate all four points

# The learned boundary: w·x + b = 0
w = clf.coef_[0]
b = clf.intercept_[0]
print(f"Boundary: {w[0]:.2f}·x₁ + {w[1]:.2f}·x₂ + {b:.2f} = 0")
print()

# Score and prediction for each point
for xi, yi, pi in zip(X_xor, y_xor, preds):
    score = np.dot(w, xi) + b
    correct = "✓" if pi == yi else "✗"
    print(f"  x={xi}, true={yi}, score={score:+.2f}, predicted={pi} {correct}")
# At least one point is always wrong — XOR is fundamentally non-linear

The Solution: Neural Networks

Neural networks solve this by learning feature transformations. Hidden layers compute new representations of the input, making the problem linearly separable in a transformed space even if it wasn't in the original space.

For XOR: if a hidden layer learns to compute a new feature capturing the XOR structure, then a single logistic output on that feature can perfectly classify the problem. The network learns both the transformation and the final linear boundary — jointly, end-to-end through gradient descent.

More layers → more complex transformations → more complex boundaries. This is why deep networks can model patterns in images and text that linear models have no hope of capturing.

Interactive example

Adjust the weight vector and bias to move the decision boundary - see which points flip class

Coming soon