Skip to content
Neural Networks
Lesson 1 ⏱ 10 min

The single neuron

Video coming soon

The Perceptron: From Biology to Math

Visual walkthrough of the weighted sum, activation function, and decision boundary.

⏱ ~6 min

🧮

Quick refresher

Weighted sums (dot products)

A dot product multiplies matching elements and adds the results. w · x = w₁x₁ + w₂x₂ + ... + wₙxₙ. It's a compact way of writing a weighted sum.

Example

w = [2, -1], x = [3, 4] → 2·3 + (-1)·4 = 6 - 4 = 2.

The Simplest Possible Brain Cell

A neuron takes a bunch of numbers, weighs each one by importance, adds them up, and produces a single output. That's it. The biological framing is evocative but the math is completely elementary.

The perceptron is the ancestor of every modern neural network. Understanding what a single neuron computes — and why it's limited — is the foundation for everything that follows, from deep learning to transformers.

Think of a single neuron like a simple scoring formula. Imagine you're deciding whether to approve a loan application. You have three signals: income (say, $80K), years of employment (5 years), and existing debt ($20K). You don't treat them equally — income matters most, debt is a red flag, employment matters a bit. So you multiply each signal by its importance score, add them up, and compare to a threshold. That is a perceptron.

Here's the full computation for a single artificial neuron:

Step 1 — Weighted sum:

z=w1x1+w2x2++wnxn+b=wx+bz = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w} \cdot \mathbf{x} + b
wiw_i
weight for input i - how important is this input?
xix_i
input value i
bb
bias - shifts the result up or down
zz
pre-activation sum

Step 2 — Activation:

a=σ(z)a = \sigma(z)
aa
neuron output (its 'activation')
σ\sigma
the sigmoid function, or any other nonlinearity

That's the whole neuron. If you use a activation, then a=σ(wx+b)a = \sigma(\mathbf{w} \cdot \mathbf{x} + b). That is exactly logistic regression: take a weighted sum, then squash it into the range (0,1)(0,1). A single sigmoid neuron IS logistic regression. Different vocabulary, same math.

The Original Perceptron (1958)

Frank Rosenblatt's perceptron used a step function instead of sigmoid:

activation(z)={1amp;if z0 0amp;if zlt;0\text{activation}(z) = \begin{cases} 1 & \text{if } z \geq 0 \ 0 & \text{if } z < 0 \end{cases}

The intuition is compelling — the neuron either fires or it doesn't, like a biological neuron. But it has a fatal flaw.

The , which makes training by gradient descent impossible.

The original perceptron was trained with a special rule that only worked for linearly separable data. In 1969, Minsky and Papert showed perceptrons couldn't solve XOR — a non-linearly-separable problem. Research funding dried up. The 1970s AI winter followed.

Making It Differentiable

The fix: replace the step function with a smooth approximation. Sigmoid is S-shaped, bounded between 0 and 1, and differentiable everywhere. The letter ee here is Euler's number, approximately 2.718 — a mathematical constant that appears naturally in exponential growth and decay. The minus sign in eze^{-z} means: for large positive zz, eze^{-z} is near 0, so the output is near 1. For large negative zz, eze^{-z} is huge, so the output is near 0:

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}
ee
Euler's number ≈ 2.718
InteractiveSigmoid vs. Step Function
step-4-22400.51

The sigmoid becomes σ(t·z) where t is sharpness. At t → ∞ it becomes the step function — but loses its derivative (gradient = 0 everywhere).

Sigmoid roughly mimics the step function's behavior while being differentiable everywhere. That unlocks gradient descent, backpropagation, and everything that follows.

Why One Neuron Isn't Enough

A single neuron, regardless of activation function, .

It can't learn XOR. It can't separate concentric circles. It can't model any pattern where the true boundary is curved.

To get nonlinear boundaries, you need to compose multiple neurons across layers. A single neuron is the atom; a network is the molecule.

Quiz

1 / 3

A single neuron with sigmoid activation computes the same thing as...