Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Zhu, Xingyu; Wang, Zixuan; Wang, Xiang; Zhou, Mo; Ge, Rong

Computer Science > Machine Learning

arXiv:2210.03294 (cs)

[Submitted on 7 Oct 2022 (v1), last revised 21 Feb 2023 (this version, v2)]

Title:Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Authors:Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

View PDF

Abstract:Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold $2/\eta$ (where $\eta$ is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below $2/\eta$. While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and $2/\eta$. In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the final converging point has sharpness close to $2/\eta$. Globally we observe that the training dynamics for our example has an interesting bifurcating behavior, which was also observed in the training of neural nets.

Comments:	53 pages, 19 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
ACM classes:	I.2.6
Cite as:	arXiv:2210.03294 [cs.LG]
	(or arXiv:2210.03294v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.03294

Submission history

From: Xingyu Zhu [view email]
[v1] Fri, 7 Oct 2022 02:57:05 UTC (10,245 KB)
[v2] Tue, 21 Feb 2023 09:45:37 UTC (16,263 KB)

Computer Science > Machine Learning

Title:Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators