Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses distributed optimization under nonconvex and strongly convex settings, with heterogeneous data and arbitrary communication graphs. To overcome limitations of conventional methods—such as reliance on spectral gap assumptions, slow convergence, and high communication overhead—we propose the Spanning Tree Push-Pull (STPP) framework, the first to leverage *two spanning trees* to decouple topological constraints. STPP integrates the Push-Pull consensus mechanism with stochastic gradient updates, enabling adaptive and efficient optimization over arbitrary graphs—including sparse cyclic or dense exponential-degree topologies. Theoretically, STPP achieves linear speedup and reduces transient complexity to $O(n^7)$ for nonconvex and $widetilde{O}(n^3)$ for strongly convex objectives. Empirical evaluations confirm its superior scalability and convergence performance over state-of-the-art algorithms in large-scale node regimes.

Technology Category

Application Category

📝 Abstract
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, Spanning Tree Push-Pull (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradients. Unlike prior approaches that rely heavily on spectral gap properties, STPP leverages a more flexible topological characterization, enabling robust information flow and efficient updates. Theoretically, we prove that STPP achieves linear speedup and polynomial transient iteration complexity, up to $O(n^7)$ for smooth nonconvex objectives and $ ilde{O}(n^3)$ for smooth strongly convex objectives, under arbitrary network topologies. Moreover, compared with the existing methods, STPP achieves faster convergence rates on sparse and non-regular topologies (e.g., directed ring) and reduces communication overhead on dense networks (e.g., static exponential graph). These results significantly advance the state of the art, especially when $n$ is large. Numerical experiments further demonstrate the strong performance of STPP and confirm the practical relevance of its theoretical convergence rates across various common graph architectures. Our code is available at https://anonymous.4open.science/r/SpanningTreePushPull-5D3E.
Problem

Research questions and friction points this paper is trying to address.

Distributed learning with heterogeneous data across agents
Efficient peer-to-peer communication for model updates
Achieving linear speedup and polynomial transient time
Innovation

Methods, ideas, or system contributions that make the work stand out.

STPP algorithm uses dual spanning trees
Achieves linear speedup, polynomial transient time
Efficient on sparse, dense, arbitrary topologies
🔎 Similar Papers
No similar papers found.
R
Runze You
School of Data Science (SDS), The Chinese University of Hong Kong, Shenzhen
Shi Pu
Shi Pu
贵州电信 China Telecom Guizhou Branch
Computer vision