Diffusion intuition: destroy, then learn to undo
Diffusion models run a forward process that gradually adds noise until data looks almost Gaussian, then train a network to reverse that corruption step by step . The learned reverse policy removes noise consistent with the training data manifold [2].
Forward noising is Markovian: each step adds a little Gaussian noise with variance chosen by a schedule $\beta_t$. Training pairs corrupted samples with targets for what a slightly denoised version should look like .
At each noise level, the trainer approximates the conditional expectation of slightly cleaner data given a noised sample (equivalently predicting noise $\epsilon$ or a score function) . Sampling iterates many small denoising updates; latency trades with fidelity [2].

Classifier-free guidance blends conditional and unconditional score estimates, scaling the gap to trade prompt fidelity against sample diversity [2].

Unlike a single forward pass in classification, generation is a loop: each step applies the same weights under different noise levels .
DDPM-style training adds noise on a schedule; the network learns to invert it. Score-matching views unify objectives, but implementations still expose step count and variance choices that affect sample quality .
Scheduler choice (DDIM, DPM-Solver, etc.) changes the noise trajectory at inference; the same weights can look sharp or mushy depending on step spacing .
Prompt conditioning injects text embeddings via cross-attention layers inside the U-Net, analogous to encoder-decoder attention in language models [2].
Sources
Related cards
Video Content
Tasks
Card Info
- Topic: Machine learning
- Difficulty: Intermediate
- Completed: 0 users