FedSteer: Taming Extreme Gradient Staleness in Federated Learning with Corrective Projections and Caching

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of training instability and potential performance collapse in federated learning caused by imbalanced client participation, which leads to severely stale gradients from infrequently active clients. To mitigate this issue, the authors propose a novel paradigm that constructs a low-dimensional subspace using gradients cached from recently active clients. The current true gradient is orthogonally projected onto this subspace to obtain optimal coordinates, which are then used to steer stale gradients from inactive clients toward the prevailing global optimization direction. Additionally, a representative client selection strategy is introduced to reduce server memory overhead. Experimental results demonstrate that the proposed method significantly outperforms existing approaches across various challenging scenarios, effectively preventing performance collapse and achieving accuracy improvements exceeding 7% on certain tasks.

📝 Abstract

Federated learning (FL) is often subject to aggregation variance if clients do not consistently participate in training rounds. While reusing stale model updates from inactive clients is a common technique to reduce this variance, we find that with skewed client participation, the resulting update staleness can become severe enough to destabilize training. To remedy this, we propose FedSteer, a novel method that constructs a gradient subspace from a cache of recent client gradients to serve as a low-dimensional representation of the current optimization landscape. FedSteer projects an active client's true gradient onto this subspace to find a set of optimal coordinates. For an inactive client, FedSteer reuses these coordinates with the now-evolved subspace drifted by other active clients. This process effectively "steers" outdated gradients toward the current global objective. This is complemented by a selective caching strategy that identifies a representative client subset to form the subspace, reducing server memory. Experiments demonstrate that FedSteer significantly outperforms baselines, preventing performance collapse in challenging scenarios while delivering accuracy gains of over 7% in others.

Problem

Research questions and friction points this paper is trying to address.

federated learning

gradient staleness

client participation skew

aggregation variance

training instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient staleness

federated learning

gradient subspace