The single neuron — Neural Networks

The Simplest Possible Brain Cell

A neuron takes a bunch of numbers, weighs each one by importance, adds them up, and produces a single output. That's it. The biological framing is evocative but the math is completely elementary.

The perceptron is the ancestor of every modern neural network. Understanding what a single neuron computes — and why it's limited — is the foundation for everything that follows, from deep learning to transformers.

Think of a single neuron like a simple scoring formula. Imagine you're deciding whether to approve a loan application. You have three signals: income (say, $80K), years of employment (5 years), and existing debt ($20K). You don't treat them equally — income matters most, debt is a red flag, employment matters a bit. So you multiply each signal by its importance score, add them up, and compare to a threshold. That is a perceptron.

Here's the full computation for a single artificial neuron:

Step 1 — Weighted sum:

z = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w} \cdot \mathbf{x} + b

$w_i$: weight for input i - how important is this input?
$x_i$: input value i
$b$: bias - shifts the result up or down
$z$: pre-activation sum

Step 2 — Activation:

a = \sigma(z)

$a$: neuron output (its 'activation')
$\sigma$: the sigmoid function, or any other nonlinearity

That's the whole neuron. If you use a activation, then $a = \sigma(\mathbf{w} \cdot \mathbf{x} + b)$ . That is exactly logistic regression: take a weighted sum, then squash it into the range $(0,1)$ . A single sigmoid neuron IS logistic regression. Different vocabulary, same math.

The Original Perceptron (1958)

Frank Rosenblatt's perceptron used a step function instead of sigmoid:

\text{activation}(z) = \begin{cases} 1 &amp; \text{if } z \geq 0 \ 0 &amp; \text{if } z &lt; 0 \end{cases}

The intuition is compelling — the neuron either fires or it doesn't, like a biological neuron. But it has a fatal flaw.

The , which makes training by gradient descent impossible.

The original perceptron was trained with a special rule that only worked for linearly separable data. In 1969, Minsky and Papert showed perceptrons couldn't solve XOR — a non-linearly-separable problem. Research funding dried up. The 1970s AI winter followed.

Making It Differentiable

The fix: replace the step function with a smooth approximation. Sigmoid is S-shaped, bounded between 0 and 1, and differentiable everywhere. The letter $e$ here is Euler's number, approximately 2.718 — a mathematical constant that appears naturally in exponential growth and decay. The minus sign in $e^{-z}$ means: for large positive $z$ , $e^{-z}$ is near 0, so the output is near 1. For large negative $z$ , $e^{-z}$ is huge, so the output is near 0:

\sigma(z) = \frac{1}{1 + e^{-z}}

$e$: Euler's number ≈ 2.718

InteractiveSigmoid vs. Step Function

Sharpness: 1× — standard sigmoid

The sigmoid becomes σ(t·z) where t is sharpness. At t → ∞ it becomes the step function — but loses its derivative (gradient = 0 everywhere).

Sigmoid roughly mimics the step function's behavior while being differentiable everywhere. That unlocks gradient descent, backpropagation, and everything that follows.

Why One Neuron Isn't Enough

A single neuron, regardless of activation function, .

It can't learn XOR. It can't separate concentric circles. It can't model any pattern where the true boundary is curved.

To get nonlinear boundaries, you need to compose multiple neurons across layers. A single neuron is the atom; a network is the molecule.