🤖 AI Summary
In federated learning, bias-compensated compression typically relies on client-specific control variables—violating privacy preservation and statelessness requirements. To address this, we propose Compressed Aggregate Feedback (CAFe), a novel framework that eliminates per-client control variables and instead constructs an unbiased feedback mechanism using historical global aggregated updates. Grounded in distributed gradient descent (DGD), CAFe provides the first rigorous convergence guarantee for biased compression under non-smooth and gradient-heterogeneous settings. Theoretically, its convergence bound strictly improves upon that of classical error-feedback-based DCGD. Empirically, CAFe achieves significantly higher model accuracy and enhanced convergence stability at equivalent high compression ratios. By jointly optimizing communication efficiency, privacy security, and system statelessness, CAFe establishes a practical, theoretically grounded paradigm for real-world federated learning deployments.
📝 Abstract
Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads. The compression methods used in practice are often biased, which require error feedback to achieve convergence when the compression is aggressive. In turn, error feedback requires client-specific control variates, which directly contradicts privacy-preserving principles and requires stateful clients. In this paper, we propose Compressed Aggregate Feedback (CAFe), a novel distributed learning framework that allows highly compressible client updates by exploiting past aggregated updates, and does not require control variates. We consider Distributed Gradient Descent (DGD) as a representative algorithm and provide a theoretical proof of CAFe's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-smooth regime with bounded gradient dissimilarity. Experimental results confirm that CAFe consistently outperforms distributed learning with direct compression and highlight the compressibility of the client updates with CAFe.