Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Decentralized optimization over heterogeneous devices faces challenges from computational heterogeneity and unpredictable communication delays. Method: We propose Asynchronous Decentralized Stochastic Gradient Descent (ADSGD), a center-free algorithm that eliminates the need for centralized coordination. Contribution/Results: We establish, for the first time, its convergence guarantee for non-convex objectives without assuming bounded data heterogeneity. To enable step-size design independent of computation-communication delays, we develop an Asynchronous Stochastic Block Coordinate Descent (ASBCD) analytical framework. Experiments demonstrate that ADSGD significantly reduces wall-clock time, lowers memory and communication overhead, and exhibits strong robustness to both computational and communication delays—making it well-suited for real-world distributed learning settings.

Technology Category

Application Category

📝 Abstract

Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assumptions of bounded computation and communication times. To understand the convergence of ADSGD, we first analyze Asynchronous Stochastic Block Coordinate Descent (ASBCD) as a tool, and then show that ADSGD converges under computation-delay-independent step sizes. The convergence result is established without assuming bounded data heterogeneity. Empirical experiments reveal that ADSGD outperforms existing methods in wall-clock convergence time across various scenarios. With its simplicity, efficiency in memory and communication, and resilience to communication and computation delays, ADSGD is well-suited for real-world decentralized learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in decentralized optimization due to heterogeneous computation speeds

Proposes Asynchronous Decentralized SGD resilient to communication delays

Ensures convergence without bounded data heterogeneity assumption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous Decentralized SGD with bounded delays

Convergence via Block-Coordinate Descent analysis

Delay-independent step sizes for robustness

🔎 Similar Papers

Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions