Local minima, saddles, and plateaus

Beginner Machine learning
Created by Best · 01.06.2026 at 06:20 UTC

Zero gradient marks a critical point, but not every critical point is a desirable resting place. A local minimum sits lower than all nearby points; a local maximum sits higher; a saddle curves upward along some directions and downward along others . In low dimensions saddles look like horse saddles; in high dimensions they dominate the critical-point count far more than isolated minima.

Plateaus are broad regions where gradients are tiny even though you are not at a critical point. SGD slows to a crawl because each step is proportional to the slope. Sigmoid-saturated networks famously suffered from vanishing gradients on plateaus; modern ReLU stacks reduce but do not eliminate flat spots .

Classifying a critical point needs second-order information: eigenvalues of the Hessian tell whether curvature is positive, negative, or mixed. That analysis is instructive on toy surfaces but rarely computed at neural-network scale. Minibatch noise can help escape shallow basins by jittering iterates across low walls .

Zero gradient is necessary for a local minimum in smooth unconstrained problems, but not sufficient for a global minimum: many critical points may exist, and descent only finds one basin depending on initialization and noise .

Deep networks add another wrinkle: symmetries and reparameterizations can make different weight settings implement nearly identical input-output maps, so the landscape has many equivalent valleys. Optimization cares about function quality, not a unique parameter vector .

University approvals: 0
Related cards
Video Content
Tasks
Question 1

A saddle point of the cost surface has:

Hint

Skim the paragraphs on saddle point cost surface in Local minima, saddles, and plateaus before choosing. Eliminate options that contradict a definition stated in the card.

Question 2

Plateau regions slow SGD because:

Hint

Skim the paragraphs on Plateau regions slow because in Local minima, saddles, and plateaus before choosing. Eliminate options that contradict a definition stated in the card.

Question 3

The noise in minibatch gradients can help the optimizer escape:

Hint

Skim the paragraphs on noise minibatch gradients help optimizer in Local minima, saddles, and plateaus before choosing. Eliminate options that contradict a definition stated in the card.

Question 4

Why is a zero gradient necessary but not sufficient for a global minimum?

Hint

Skim the paragraphs on a zero gradient necessary but not sufficient for in Local minima, saddles, and plateaus before choosing. Eliminate options that contradict a definition stated in the card.

Card Info
  • Topic: Machine learning
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Best
Best
BestBuddy