The advances in deep learning build on the dark arts of gradient-based optimization. In deep learning, the optimization process is oscillatory, spiky, and unstable. This makes little sense in classical optimization theory, which primarily operates in a well-behaved, stable regime. Yet, the best training configuration in practice constantly operates in an unstable regime. This tutorial introduces recent theoretical progress in understanding the benign nature of training instabilities, providing new insights from both optimization and statistical learning perspectives.