Local minima, saddles, and plateaus
Zero gradient marks a critical point, but not every critical point is a desirable resting place. A local minimum sits lower than all nearby points; a local maximum sits higher; a saddle curves upward along some directions and downward along others . In low dimensions saddles look like horse saddles; in high dimensions they dominate the critical-point count far more than isolated minima.

Plateaus are broad regions where gradients are tiny even though you are not at a critical point. SGD slows to a crawl because each step is proportional to the slope. Sigmoid-saturated networks famously suffered from vanishing gradients on plateaus; modern ReLU stacks reduce but do not eliminate flat spots .

Classifying a critical point needs second-order information: eigenvalues of the Hessian tell whether curvature is positive, negative, or mixed. That analysis is instructive on toy surfaces but rarely computed at neural-network scale. Minibatch noise can help escape shallow basins by jittering iterates across low walls .
Zero gradient is necessary for a local minimum in smooth unconstrained problems, but not sufficient for a global minimum: many critical points may exist, and descent only finds one basin depending on initialization and noise .
Deep networks add another wrinkle: symmetries and reparameterizations can make different weight settings implement nearly identical input-output maps, so the landscape has many equivalent valleys. Optimization cares about function quality, not a unique parameter vector .
Related cards
Video Content
Tasks
Card Info
- Topic: Machine learning
- Difficulty: Beginner
- Completed: 0 users