What Is a Decision Boundary?
The decision boundary is the surface in feature space where the classifier is exactly on the fence — it assigns equal probability to both classes. Cross that surface and the predicted class flips.
A classifier's decision boundary is what it has actually learned. Visualizing it immediately reveals whether a model is underfitting, overfitting, or learning the right pattern — making it one of the most diagnostic tools in machine learning.
For logistic regression, the boundary is where . Since and exactly when , the is:
- weight vector - perpendicular (normal) to the decision boundary
- input feature vector
- bias - shifts the decision boundary parallel to itself
Notice: the boundary is defined by the linear part, not the sigmoid. The sigmoid just converts scores to probabilities. The geometry is entirely determined by and .
The Geometry
In dimensions, the equation defines a -dimensional :
- 2D (two features): a line —
- 3D: a plane —
- p dimensions: a (p-1)-dimensional hyperplane
The weight vector is perpendicular (normal) to the hyperplane. Moving along increases ; moving perpendicular to keeps constant (you stay on the boundary).
The bias shifts the boundary. If , the boundary passes through the origin. Changing moves the boundary parallel to itself — closer to or farther from the origin. Without , the model would be forced to have the boundary pass through , which is rarely where it should be.
Which Side Is Which?
- linear score at point x - positive means class 1 side, negative means class 0 side
Concrete example: . The boundary is , i.e., the diagonal line . Points above the diagonal (x_1 > x_2, so z > 0) are class 1. Points below (x_1 < x_2) are class 0.
Linearly Separable Data
Sometimes a single hyperplane can perfectly separate all class-0 examples from all class-1 examples. The data is then called .
In real datasets, perfect separation is rare. Classes overlap, examples are mislabeled, and real boundaries are fuzzy. Logistic regression handles this gracefully: it finds the hyperplane that minimizes cross-entropy loss, placing the boundary where it makes the fewest — or least costly — errors.
When Linear Boundaries Aren't Enough
The XOR problem demonstrates a fundamental limit:
- Point (0, 0) → class 0
- Point (1, 1) → class 0
- Point (0, 1) → class 1
- Point (1, 0) → class 1
Other patterns logistic regression cannot capture: concentric circles (one class inside a ring, the other outside), spirals, any crescent or non-convex boundary.
import numpy as np
from sklearn.linear_model import LogisticRegression
# XOR dataset: four points, two classes that no straight line can separate
X_xor = np.array([[0, 0], # class 0
[1, 1], # class 0
[0, 1], # class 1
[1, 0]]) # class 1
y_xor = np.array([0, 0, 1, 1])
# Try logistic regression (linear decision boundary only)
clf = LogisticRegression()
clf.fit(X_xor, y_xor)
preds = clf.predict(X_xor)
accuracy = (preds == y_xor).mean()
print(f"Logistic regression accuracy on XOR: {accuracy:.0%}")
# 50% or 75% — never 100%, because no straight line can separate all four points
# The learned boundary: w·x + b = 0
w = clf.coef_[0]
b = clf.intercept_[0]
print(f"Boundary: {w[0]:.2f}·x₁ + {w[1]:.2f}·x₂ + {b:.2f} = 0")
print()
# Score and prediction for each point
for xi, yi, pi in zip(X_xor, y_xor, preds):
score = np.dot(w, xi) + b
correct = "✓" if pi == yi else "✗"
print(f" x={xi}, true={yi}, score={score:+.2f}, predicted={pi} {correct}")
# At least one point is always wrong — XOR is fundamentally non-linear
The Solution: Neural Networks
Neural networks solve this by learning feature transformations. Hidden layers compute new representations of the input, making the problem linearly separable in a transformed space even if it wasn't in the original space.
For XOR: if a hidden layer learns to compute a new feature capturing the XOR structure, then a single logistic output on that feature can perfectly classify the problem. The network learns both the transformation and the final linear boundary — jointly, end-to-end through gradient descent.
More layers → more complex transformations → more complex boundaries. This is why deep networks can model patterns in images and text that linear models have no hope of capturing.
Interactive example
Adjust the weight vector and bias to move the decision boundary - see which points flip class
Coming soon