Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses distributed optimization under nonconvex and strongly convex settings, with heterogeneous data and arbitrary communication graphs. To overcome limitations of conventional methods—such as reliance on spectral gap assumptions, slow convergence, and high communication overhead—we propose the Spanning Tree Push-Pull (STPP) framework, the first to leverage *two spanning trees* to decouple topological constraints. STPP integrates the Push-Pull consensus mechanism with stochastic gradient updates, enabling adaptive and efficient optimization over arbitrary graphs—including sparse cyclic or dense exponential-degree topologies. Theoretically, STPP achieves linear speedup and reduces transient complexity to $O(n^7)$ for nonconvex and $widetilde{O}(n^3)$ for strongly convex objectives. Empirical evaluations confirm its superior scalability and convergence performance over state-of-the-art algorithms in large-scale node regimes.

Technology Category

Application Category

📝 Abstract

We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively minimize the sum of their local cost functions via peer-to-peer communication. We propose a novel algorithm, Spanning Tree Push-Pull (STPP), which employs two spanning trees extracted from a general communication graph to distribute both model parameters and stochastic gradients. Unlike prior approaches that rely heavily on spectral gap properties, STPP leverages a more flexible topological characterization, enabling robust information flow and efficient updates. Theoretically, we prove that STPP achieves linear speedup and polynomial transient iteration complexity, up to $O(n^7)$ for smooth nonconvex objectives and $ ilde{O}(n^3)$ for smooth strongly convex objectives, under arbitrary network topologies. Moreover, compared with the existing methods, STPP achieves faster convergence rates on sparse and non-regular topologies (e.g., directed ring) and reduces communication overhead on dense networks (e.g., static exponential graph). These results significantly advance the state of the art, especially when $n$ is large. Numerical experiments further demonstrate the strong performance of STPP and confirm the practical relevance of its theoretical convergence rates across various common graph architectures. Our code is available at https://anonymous.4open.science/r/SpanningTreePushPull-5D3E.

Problem

Research questions and friction points this paper is trying to address.

Distributed learning with heterogeneous data across agents

Efficient peer-to-peer communication for model updates

Achieving linear speedup and polynomial transient time

Innovation

Methods, ideas, or system contributions that make the work stand out.

STPP algorithm uses dual spanning trees

Achieves linear speedup, polynomial transient time

Efficient on sparse, dense, arbitrary topologies

🔎 Similar Papers

No similar papers found.

Authors to Follow