Distributed Random Reshuffling Methods with Improved Convergence

📅 2023-06-21
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses the distributed optimization problem of minimizing the average of local loss functions over a connected multi-agent network. To overcome the suboptimal convergence of existing distributed random reshuffling (RR) methods, we propose two novel algorithms—GT-RR and ED-RR—by integrating RR into the gradient tracking (GT) and exact diffusion (ED) frameworks, respectively—the first such integration in the literature. Under nonconvex smooth objectives, both algorithms achieve a gradient norm convergence rate of $Oig(1/[(1-lambda)^{1/3}m^{1/3}T^{2/3}]ig)$, where $lambda$ characterizes network connectivity, $m$ is the number of local data samples, and $T$ is the total iterations. Under the Polyak–Łojasiewicz condition, they attain an objective value error rate of $Oig(1/[(1-lambda)mT^2]ig)$, matching centralized RR and substantially outperforming prior distributed RR variants. Our key contribution lies in the principled tight coupling of RR with mainstream distributed optimization architectures, accompanied by the first optimal-rate theoretical analysis for such hybrid schemes.
📝 Abstract
This paper proposes two distributed random reshuffling methods, namely Gradient Tracking with Random Reshuffling (GT-RR) and Exact Diffusion with Random Reshuffling (ED-RR), to solve the distributed optimization problem over a connected network, where a set of agents aim to minimize the average of their local cost functions. Both algorithms invoke random reshuffling (RR) update for each agent, inherit favorable characteristics of RR for minimizing smooth nonconvex objective functions, and improve the performance of previous distributed random reshuffling methods both theoretically and empirically. Specifically, both GT-RR and ED-RR achieve the convergence rate of $O(1/[(1-lambda)^{1/3}m^{1/3}T^{2/3}])$ in driving the (minimum) expected squared norm of the gradient to zero, where $T$ denotes the number of epochs, $m$ is the sample size for each agent, and $1-lambda$ represents the spectral gap of the mixing matrix. When the objective functions further satisfy the Polyak-{L}ojasiewicz (PL) condition, we show GT-RR and ED-RR both achieve $O(1/[(1-lambda)mT^2])$ convergence rate in terms of the averaged expected differences between the agents' function values and the global minimum value. Notably, both results are comparable to the convergence rates of centralized RR methods (up to constant factors depending on the network topology) and outperform those of previous distributed random reshuffling algorithms.
Problem

Research questions and friction points this paper is trying to address.

Proposes distributed random reshuffling methods for optimization.
Improves convergence rates for smooth nonconvex objective functions.
Achieves comparable performance to centralized reshuffling methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes GT-RR and ED-RR for distributed optimization
Improves convergence rates using random reshuffling
Achieves performance comparable to centralized methods
🔎 Similar Papers
No similar papers found.
K
Kun Huang
School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
L
Linli Zhou
School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
Shi Pu
Shi Pu
贵州电信 China Telecom Guizhou Branch
Computer vision