🤖 AI Summary
This work addresses the convergence challenges of asynchronous adaptive first-order methods in non-convex stochastic optimization by proposing a class of parallel asynchronous adaptive algorithms that support momentum and inexact normalization, encompassing asynchronous variants of several mainstream optimizers. Under a fully stochastic setting, the paper establishes—for the first time—an $O(1/\sqrt{t})$ convergence rate (up to logarithmic factors) for such methods on non-convex objectives. The theoretical analysis rigorously integrates techniques from asynchronous parallel computation, adaptive learning rates, and stochastic optimization to prove convergence guarantees. Empirical evaluations further demonstrate the algorithm’s efficiency and practicality in heterogeneous large-scale machine learning systems.
📝 Abstract
A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic factors) of order O(1/sqrt{t}) under reasonable assumptions. Numerical experiments suggest that such asynchronous adaptive algorithms are very relevant in heterogeneous large-scale machine learning systems.