Diffusion intuition: destroy, then learn to undo

Diffusion models run a forward process that gradually adds noise until data looks almost Gaussian, then train a network to reverse that corruption step by step . The learned reverse policy removes noise consistent with the training data manifold [2].

Forward noising is Markovian: each step adds a little Gaussian noise with variance chosen by a schedule $\beta_t$. Training pairs corrupted samples with targets for what a slightly denoised version should look like .

At each noise level, the trainer approximates the conditional expectation of slightly cleaner data given a noised sample (equivalently predicting noise $\epsilon$ or a score function) . Sampling iterates many small denoising updates; latency trades with fidelity [2].

Classifier-free guidance blends conditional and unconditional score estimates, scaling the gap to trade prompt fidelity against sample diversity [2].

Unlike a single forward pass in classification, generation is a loop: each step applies the same weights under different noise levels .

DDPM-style training adds noise on a schedule; the network learns to invert it. Score-matching views unify objectives, but implementations still expose step count and variance choices that affect sample quality .

Scheduler choice (DDIM, DPM-Solver, etc.) changes the noise trajectory at inference; the same weights can look sharp or mushy depending on step spacing .

Prompt conditioning injects text embeddings via cross-attention layers inside the U-Net, analogous to encoder-decoder attention in language models [2].

Sources

[2]https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi Return to text

Diffusion intuition: destroy, then learn to undo

Sources

Related cards

Video Content

Tasks

Question 1

Question 2

Question 3

Question 4

Card Info

Creator