🤖 AI Summary
Communication overhead in distributed machine learning remains prohibitive, and integrating acceleration with gradient compression has been theoretically challenging due to incompatibility between acceleration dynamics and contractive compressors.
Method: This paper proposes a unified optimization framework synergistically combining Nesterov acceleration, shrinking-type compression, error feedback, and gradient-difference compression.
Contribution/Results: We establish, for the first time under general convex settings, the optimal accelerated convergence rate (O(1/T^2)) for stochastic distributed optimization with compressed communication—resolving the long-standing theoretical barrier of incompatibility between acceleration and contractive compression. Our rigorous analysis precisely characterizes the coupled impact of compression-induced errors and accelerated dynamics. Empirical evaluations demonstrate that the method reduces communication volume by up to 90% while preserving convergence speed comparable to uncompressed accelerated algorithms.
📝 Abstract
Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.