Accelerated Distributed Optimization with Compression and Error Feedback

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Communication overhead in distributed machine learning remains prohibitive, and integrating acceleration with gradient compression has been theoretically challenging due to incompatibility between acceleration dynamics and contractive compressors. Method: This paper proposes a unified optimization framework synergistically combining Nesterov acceleration, shrinking-type compression, error feedback, and gradient-difference compression. Contribution/Results: We establish, for the first time under general convex settings, the optimal accelerated convergence rate (O(1/T^2)) for stochastic distributed optimization with compressed communication—resolving the long-standing theoretical barrier of incompatibility between acceleration and contractive compression. Our rigorous analysis precisely characterizes the coupled impact of compression-induced errors and accelerated dynamics. Empirical evaluations demonstrate that the method reduces communication volume by up to 90% while preserving convergence speed comparable to uncompressed accelerated algorithms.

Technology Category

Application Category

📝 Abstract

Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.

Problem

Research questions and friction points this paper is trying to address.

Reduces communication overhead in distributed optimization

Integrates Nesterov acceleration with contractive compression

Achieves accelerated convergence in stochastic distributed optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Nesterov acceleration and compression

Uses error feedback for improved convergence

Reduces communication costs with gradient compression

🔎 Similar Papers

LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression