NeurIPS 2025 Tutorial: Theoretical Insights on Training Instability in Deep Learning

Jingfeng Wu
Jingfeng Wu
UC Berkeley
Yu-Xiang Wang
Yu-Xiang Wang
UC San Diego
Maryam Fazel
Maryam Fazel
UW

Overview

The advances in deep learning build on the dark arts of gradient-based optimization. In deep learning, the optimization process is oscillatory, spiky, and unstable. This makes little sense in classical optimization theory, which primarily operates in a well-behaved, stable regime. Yet, the best training configuration in practice constantly operates in an unstable regime. This tutorial introduces recent theoretical progress in understanding the benign nature of training instabilities, providing new insights from both optimization and statistical learning perspectives.

Resources

References

Large stepsizes accelerates optimization

More references

Large stepsizes prevent overfitting

More references

Science of large stepsizes

More references

Instability from optimizer and landscape codesign

Other instability