Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the high communication cost in distributed learning, where existing methods typically exhibit communication complexity dependent on the total number of samples \(N\). The authors propose a novel distributed optimization framework that integrates local updates with variance reduction techniques, achieving—for the first time—a communication complexity that depends only on the number of worker nodes \(M\) and is independent of \(N\). When \(M = O(N^{1/4})\), the proposed method outperforms the current state-of-the-art algorithms. Under typical settings, the framework substantially reduces communication overhead and demonstrates superior empirical performance compared to strong baselines such as Minibatch Accelerated SGD.
📝 Abstract
Communication overhead is a crucial bottleneck in scalable distributed learning. While existing methods aim to efficiently utilize data points, such as Local SGD, Minibatch SGD, and their accelerated variants, they still exhibit communication-round complexity that scales with the total number of samples $N$. In this paper, we introduce Local MixVR, a distributed framework that integrates local updates with variance-reduction techniques to mitigate local noise. We show that Local MixVR is the first distributed method to eliminate the dependence of communication complexity on $N$, achieving a complexity that scales only with the number of workers $M$. In common regimes where $M<O\left(N^{1/4}\right)$, Local MixVR outperforms the state-of-the-art Minibatch Accelerated SGD baseline, bridging a long-standing gap in distributed optimization and establishing a new paradigm for communication-efficient training.
Problem

Research questions and friction points this paper is trying to address.

distributed learning
communication overhead
sample dependence
communication complexity
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local MixVR
variance reduction
communication complexity
distributed learning
local updates