What Does a Generative Model Do?
Every model you have built so far has answered the same question: given input , what is ? A linear regression predicts a price. A classifier predicts a category. A transformer predicts the next token. All of these are — they learn the conditional distribution .
Generative AI — the technology behind ChatGPT, Stable Diffusion, and GitHub Copilot — is entirely built on generative models. Autoencoders, VAEs, GANs, and diffusion models are the four foundational architectures this unit covers. Understanding them means understanding how AI can create, not just classify.
A generative model asks something fundamentally different: what does a typical look like? It learns the distribution over the data itself — not over labels, but over raw inputs. With this distribution you can:
- Sample new examples that look like the training data (generate images, text, audio)
- Evaluate likelihood: assign a score to any input
- Detect anomalies: a sample with very low is unusual
- Complete partial inputs: given the first half of an image, infer the rest
- Understand structure: the distribution reveals what the data "cares about"
These capabilities are qualitatively different from classification. You are not labeling — you are modeling reality.
Why Is This Hard?
Consider a 256 × 256 color image: pixel values. A meaningful probability distribution must assign a number to every possible image — and the vast, overwhelming majority of random pixel arrangements look like static, not photographs.
The space has configurations. You cannot store a table. You cannot fit a histogram. You must find some compact parameterized structure that concentrates probability mass exactly where real images live.
Three Approaches
Over the past decade, three broad strategies have emerged for tractable generative modeling.
1 — Autoregressive Models
Apply the to factor the joint distribution one dimension at a time:
- joint probability of the full data point x
- the i-th component (e.g., one pixel, one token)
- all components before i
Each factor is modeled by a neural network conditioned on the previous values. GPT is an autoregressive model over tokens. PixelCNN is an autoregressive model over pixels. The advantage: exact likelihoods, stable training. The disadvantage: slow sequential sampling.
2 — Latent Variable Models
Introduce a hidden variable and write:
- marginal distribution of observations
- likelihood: how z generates x
- prior over latent variables (often N(0,I))
- integrate over all possible z values
The idea: the high-dimensional data is a noisy, transformed view of a simpler low-dimensional latent code . Variational Autoencoders (VAEs) and diffusion models fall here.
3 — Implicit Models (GANs)
Instead of writing down a formula for , train a sampler directly. A generator network maps noise to samples . The distribution of is the implicit model. A discriminator network provides the training signal. You never compute explicitly — but you can generate samples.
A Roadmap for This Unit
This unit builds each approach from scratch:
| Lesson | Model | Core idea |
|---|---|---|
| 14-2 | Autoencoder | Bottleneck reconstruction |
| 14-3 | VAE | Stochastic encoder |
| 14-4 | ELBO | Principled VAE objective |
| 14-5 | Reparameterization | Differentiating through sampling |
| 14-6 | GAN | Adversarial training |
| 14-7 | GAN dynamics | Mode collapse and fixes |
| 14-8 | Diffusion (forward) | Scheduled noising |
| 14-9 | Diffusion (reverse) | DDPM denoising |
| 14-10 | Score matching | Unified theory |
Every model in this unit is an answer to the same question: how do we compress a complex data distribution into a trainable neural network? Each gives a different trade-off between tractability, sample quality, and training stability.
Interactive example
Compare samples from autoregressive, VAE, GAN, and diffusion models on the same dataset
Coming soon