Accelerated Distributed Optimization with Compression and Error Feedback

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Communication overhead in distributed machine learning remains prohibitive, and integrating acceleration with gradient compression has been theoretically challenging due to incompatibility between acceleration dynamics and contractive compressors. Method: This paper proposes a unified optimization framework synergistically combining Nesterov acceleration, shrinking-type compression, error feedback, and gradient-difference compression. Contribution/Results: We establish, for the first time under general convex settings, the optimal accelerated convergence rate (O(1/T^2)) for stochastic distributed optimization with compressed communication—resolving the long-standing theoretical barrier of incompatibility between acceleration and contractive compression. Our rigorous analysis precisely characterizes the coupled impact of compression-induced errors and accelerated dynamics. Empirical evaluations demonstrate that the method reduces communication volume by up to 90% while preserving convergence speed comparable to uncompressed accelerated algorithms.

Technology Category

Application Category

📝 Abstract
Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.
Problem

Research questions and friction points this paper is trying to address.

Reduces communication overhead in distributed optimization
Integrates Nesterov acceleration with contractive compression
Achieves accelerated convergence in stochastic distributed optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Nesterov acceleration and compression
Uses error feedback for improved convergence
Reduces communication costs with gradient compression
🔎 Similar Papers
No similar papers found.
Y
Yuan Gao
CISPA Helmholtz Center for Information Security, Germany; Universität des Saarlandes, Germany
Anton Rodomanov
Anton Rodomanov
CISPA Helmholtz Center for Information Security
OptimizationMachine LearningNumerical MethodsComplexity Guarantees
J
Jeremy Rack
CISPA Helmholtz Center for Information Security, Germany; Universität des Saarlandes, Germany
S
Sebastian U. Stich
CISPA Helmholtz Center for Information Security, Germany