🤖 AI Summary
This work addresses the challenges posed by heavy-tailed gradient noise and highly heterogeneous computation times across workers in asynchronous stochastic nonconvex optimization. The authors propose a momentum-based asynchronous normalized stochastic gradient descent algorithm that, under the mild assumption of bounded $p$-th central moments of the gradient noise for $p \in (1,2]$, achieves optimal time complexity in arbitrarily heterogeneous computing environments. Theoretical analysis establishes both convergence and optimality of the proposed method, while numerical experiments further demonstrate its robustness and effectiveness in settings with heavy-tailed noise distributions.
📝 Abstract
This paper considers the problem of asynchronous stochastic nonconvex optimization with heavy-tailed gradient noise and arbitrarily heterogeneous computation times across workers. We propose an asynchronous normalized stochastic gradient descent algorithm with momentum. The analysis show that our method achieves the optimal time complexity under the assumption of bounded $p$th-order central moment with $p\in(1,2]$. We also provide numerical experiments to show the effectiveness of proposed method.