The Simplest Possible Brain Cell
A neuron takes a bunch of numbers, weighs each one by importance, adds them up, and produces a single output. That's it. The biological framing is evocative but the math is completely elementary.
The perceptron is the ancestor of every modern neural network. Understanding what a single neuron computes — and why it's limited — is the foundation for everything that follows, from deep learning to transformers.
Think of a single neuron like a simple scoring formula. Imagine you're deciding whether to approve a loan application. You have three signals: income (say, $80K), years of employment (5 years), and existing debt ($20K). You don't treat them equally — income matters most, debt is a red flag, employment matters a bit. So you multiply each signal by its importance score, add them up, and compare to a threshold. That is a perceptron.
Here's the full computation for a single artificial neuron:
Step 1 — Weighted sum:
- weight for input i - how important is this input?
- input value i
- bias - shifts the result up or down
- pre-activation sum
Step 2 — Activation:
- neuron output (its 'activation')
- the sigmoid function, or any other nonlinearity
That's the whole neuron. If you use a activation, then . That is exactly logistic regression: take a weighted sum, then squash it into the range . A single sigmoid neuron IS logistic regression. Different vocabulary, same math.
The Original Perceptron (1958)
Frank Rosenblatt's perceptron used a step function instead of sigmoid:
The intuition is compelling — the neuron either fires or it doesn't, like a biological neuron. But it has a fatal flaw.
The , which makes training by gradient descent impossible.
The original perceptron was trained with a special rule that only worked for linearly separable data. In 1969, Minsky and Papert showed perceptrons couldn't solve XOR — a non-linearly-separable problem. Research funding dried up. The 1970s AI winter followed.
Making It Differentiable
The fix: replace the step function with a smooth approximation. Sigmoid is S-shaped, bounded between 0 and 1, and differentiable everywhere. The letter here is Euler's number, approximately 2.718 — a mathematical constant that appears naturally in exponential growth and decay. The minus sign in means: for large positive , is near 0, so the output is near 1. For large negative , is huge, so the output is near 0:
- Euler's number ≈ 2.718
The sigmoid becomes σ(t·z) where t is sharpness. At t → ∞ it becomes the step function — but loses its derivative (gradient = 0 everywhere).
Sigmoid roughly mimics the step function's behavior while being differentiable everywhere. That unlocks gradient descent, backpropagation, and everything that follows.
Why One Neuron Isn't Enough
A single neuron, regardless of activation function, .
It can't learn XOR. It can't separate concentric circles. It can't model any pattern where the true boundary is curved.
To get nonlinear boundaries, you need to compose multiple neurons across layers. A single neuron is the atom; a network is the molecule.