๐ค AI Summary
This work investigates the convergence of the Scaffold algorithm in federated learning under data heterogeneity and stochastic gradient updates, focusing on whether it achieves linear speedup with respect to the number of clients and how bias evolves. We propose the first Markov-chain-based convergence analysis framework for Scaffold in the stochastic setting, rigorously establishing its linear speedupโup to optimal higher-order terms in step size. Crucially, we identify an inherent higher-order residual bias that does not vanish as the number of clients increases, exposing a fundamental limitation of existing variance-reduction methods in federated optimization. By characterizing state evolution via Wasserstein distance and integrating control-variates techniques with stochastic optimization theory, we quantitatively decompose the bias structure. Our analysis provides foundational theoretical insights for designing unbiased, scalable stochastic federated optimization algorithms.
๐ Abstract
This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings--where local control variates mitigate client drift--is well established, the impact of stochastic gradient updates on its performance is less understood. To address this problem, we first show that its global parameters and control variates define a Markov chain that converges to a stationary distribution in the Wasserstein distance. Leveraging this result, we prove that Scaffold achieves linear speed-up in the number of clients up to higher-order terms in the step size. Nevertheless, our analysis reveals that Scaffold retains a higher-order bias, similar to FedAvg, that does not decrease as the number of clients increases. This highlights opportunities for developing improved stochastic federated learning algorithms