Near-Optimal Decentralized Stochastic Convex Optimization over Networks

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses decentralized stochastic smooth convex optimization over a fixed communication network, aiming to maximize the number of participating nodes $M$ under a total gradient sample budget $N$ while preserving the optimal statistical convergence rate of $O(1/\sqrt{N})$ achievable by centralized methods. To this end, the authors propose a novel algorithm that integrates accelerated gossip communication, mini-batch gradients, and a single-step delayed acceleration mechanism. This approach effectively controls the residual inconsistency among nodes and exhibits only logarithmic dependence on local data heterogeneity. The method achieves a significantly improved scalability bound of $M \lesssim \sqrt{\rho}\, N^{3/4}$, where $\rho$ denotes the network spectral gap, surpassing the previous best-known bound of $M \lesssim \rho \sqrt{N}$. Moreover, the authors establish the optimality of this bound for first-order methods within the linear span class.

📝 Abstract

We study decentralized stochastic smooth convex optimization, where $M$ workers minimize an average objective using local stochastic gradients and neighbor-only communication over a fixed gossip network. A central question in this setting is to determine the largest number of workers that can be used under a total budget of $N$ gradient samples while still preserving the centralized $O(1/\sqrt N)$ statistical rate. We introduce an accelerated decentralized method that preserves this rate for up to $\smash{M\lesssim \sqrtρ\,N^{3/4}}$ workers, where $ρ$ is the spectral gap of the gossip network, improving the best prior maximal scaling of $\smash{M\lesssim ρ\sqrt N}$. The method is based on a one-step-delayed stochastic acceleration scheme that enables workers to interleave minibatching with accelerated gossip while controlling residual disagreement, and its guarantee depends only logarithmically on the optimum-local heterogeneity. We also establish a matching lower bound for linear-span decentralized first-order methods, showing that the method is optimal up to logarithmic factors.

Problem

Research questions and friction points this paper is trying to address.

decentralized optimization

stochastic convex optimization

gossip network

statistical rate

worker scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

decentralized optimization

stochastic acceleration

gossip networks