π€ AI Summary
This work addresses the challenge of slow residual decay in existing error feedback (EF)-based gradient compression methods under non-IID data, which often causes gradient mismatch and training stagnation in early-stage federated learning. To mitigate this, the authors propose a Stepwise Adaptive Partial Error Feedback (SA-PEF) mechanism that unifies EF and SAEF frameworks through an adjustable stepwise coefficient Ξ±, integrating partial error feedback with stepwise correction to ensure convergence under data heterogeneity and partial client participation. Theoretical analysis highlights the critical role of the residual contraction rate Οα΅£ in accelerating early training and provides a strategy for selecting the optimal Ξ±. Empirical results demonstrate that SA-PEF consistently achieves significantly faster convergence than conventional EF methods across diverse models and datasets, approaching the convergence rate of Fed-SGD up to a constant factor.
π Abstract
Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $\alpha=0$ and step-ahead EF (SAEF) when $\alpha=1$. For non-convex objectives and $\delta$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((\eta,\eta_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $\rho_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $\alpha$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.