🤖 AI Summary
Decentralized stochastic convex optimization (SCO) suffers from a parallel scalability bottleneck: convergence degrades significantly when the number of machines exceeds a critical threshold. To address this, we propose Decentralized Anytime SGD (DA-SGD), the first method provably improving the critical parallelism limit within the SCO framework and narrowing the statistical performance gap between decentralized and centralized learning. DA-SGD integrates anytime iteration design, graph signal processing, and rigorous convergence analysis to achieve centralized-optimal convergence rates on highly connected topologies. Theoretically, it establishes a tighter upper bound on achievable parallelism than state-of-the-art methods. Empirically, DA-SGD demonstrates no performance degradation under multi-machine scaling—maintaining stable convergence and preserving solution accuracy without loss in precision.
📝 Abstract
Decentralized learning has emerged as a powerful approach for handling large datasets across multiple machines in a communication-efficient manner. However, such methods often face scalability limitations, as increasing the number of machines beyond a certain point negatively impacts convergence rates. In this work, we propose Decentralized Anytime SGD, a novel decentralized learning algorithm that significantly extends the critical parallelism threshold, enabling the effective use of more machines without compromising performance. Within the stochastic convex optimization (SCO) framework, we establish a theoretical upper bound on parallelism that surpasses the current state-of-the-art, allowing larger networks to achieve favorable statistical guarantees and closing the gap with centralized learning in highly connected topologies.