MNIST-style digits as a running example

The running example is handwritten digit recognition on $28\times 28$ grayscale images, 784 pixels flattened into the input vector. Humans read sloppy threes at low resolution effortlessly; programming explicit rules for every stroke variant is the hard task the network is meant to learn .

Each input neuron stores one pixel brightness in $[0,1]$ (0 black, 1 white). Those 784 numbers are the first layer. The output layer holds 10 neurons, one per digit class. After a forward pass, the brightest output activation is the network's guess .

Between input and output sit hidden layers, latent representations not directly observed from pixels or final labels during inference. The chapter uses two hidden layers of 16 neurons each; that width is a screen-friendly choice, not a universal law .

Training (next chapter) compares predictions with one-hot targets and drives down cross-entropy loss against a softmax output distribution. Inference reuses the learned weights: only forward propagation, no gradient pass. MNIST is a clean sandbox, centered digits, fixed size, while real vision adds clutter, viewpoint, occlusion, color, fine-grained categories, label noise, and domain shift .

MNIST-style digits as a running example

Related cards

Video Content

Tasks

Question 1

Question 2

Question 3

Question 4

Card Info

Creator