Local minima, saddles, and plateaus

Zero gradient marks a critical point, but not every critical point is a desirable resting place. A local minimum sits lower than all nearby points; a local maximum sits higher; a saddle curves upward along some directions and downward along others . In low dimensions saddles look like horse saddles; in high dimensions they dominate the critical-point count far more than isolated minima.

Plateaus are broad regions where gradients are tiny even though you are not at a critical point. SGD slows to a crawl because each step is proportional to the slope. Sigmoid-saturated networks famously suffered from vanishing gradients on plateaus; modern ReLU stacks reduce but do not eliminate flat spots .

Classifying a critical point needs second-order information: eigenvalues of the Hessian tell whether curvature is positive, negative, or mixed. That analysis is instructive on toy surfaces but rarely computed at neural-network scale. Minibatch noise can help escape shallow basins by jittering iterates across low walls .

Zero gradient is necessary for a local minimum in smooth unconstrained problems, but not sufficient for a global minimum: many critical points may exist, and descent only finds one basin depending on initialization and noise .

Deep networks add another wrinkle: symmetries and reparameterizations can make different weight settings implement nearly identical input-output maps, so the landscape has many equivalent valleys. Optimization cares about function quality, not a unique parameter vector .

Local minima, saddles, and plateaus

Related cards

Video Content

Tasks

Question 1

Question 2

Question 3

Question 4

Card Info

Creator