Parameter counting and width vs depth intuition

Beginner Machine learning
Created by Best · 01.06.2026 at 06:20 UTC

Every weight and bias is a learnable knob. The demo network connects all 784 inputs to each of 16 neurons in the first hidden layer. That alone is $784\times 16$ weights plus 16 biases, and similar matrices sit between subsequent layers .

Counting everything, the chapter network has on the order of 13{,}000 parameters. Learning means searching for a setting of those numbers that reduces average classification error on examples, a staggering hand-tuning problem if you imagine assigning each dial manually .

Fully connecting every pixel to a wide hidden layer scales poorly as resolution grows; convolutional architectures (later courses) share weights across spatial positions to exploit locality and translation structure. A single-layer perceptron without hidden units implements a linear decision boundary in raw pixel space, only linearly separable problems are easy there .

Depth helps composition: early layers can build edges, later layers assemble them into parts, and the head combines parts into classes. The same parameter budget spread across layers can represent hierarchical features rather than one monolithic mixing step, the layered story is about reusable parts, not one giant lookup table .

University approvals: 0
Related cards
Builds on MNIST-style digits as a running example · Machine learning
Video Content
Tasks
Question 1

Fully connecting every input pixel to a wide first hidden layer produces:

Hint

Skim the paragraphs on Fully connecting every input pixel in Parameter counting and width vs depth intuition before choosing. Eliminate options that contradict a definition stated in the card.

Question 2

Convolutional architectures share weights across spatial positions in order to:

Hint

Skim the paragraphs on Convolutional architectures share weights across in Parameter counting and width vs depth intuition before choosing. Eliminate options that contradict a definition stated in the card.

Question 3

A single-layer perceptron (no hidden layer) can separate classes that are:

Hint

Skim the paragraphs on single layer perceptron hidden layer in Parameter counting and width vs depth intuition before choosing. Eliminate options that contradict a definition stated in the card.

Question 4

Why can the same parameter budget represent more compositional features when spread across depth?

Hint

Skim the paragraphs on the same parameter budget represent more compositional features in Parameter counting and width vs depth intuition before choosing. Eliminate options that contradict a definition stated in the card.

Card Info
  • Topic: Machine learning
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Best
Best
BestBuddy