🤖 AI Summary
Large uplink communication overhead in federated learning (FL) severely constrains performance, especially under asymmetric bandwidth constraints of edge devices. While biased compression methods offer high efficiency, existing approaches rely on client-side error feedback—violating privacy preservation and stateless design principles. This paper proposes CAFe and its extension CAFe-S: the first biased compression frameworks requiring no client-stored control variables. CAFe employs global aggregated updates as shared control variables; CAFe-S further leverages server-side private data to generate guiding updates. Modeling via distributed gradient descent, we integrate error reconstruction, candidate update generation, and non-convex convergence analysis. Theoretically, CAFe is proven superior to DCGD, while CAFe-S converges to stationary points with convergence rate improving as server data representativeness increases. Experiments demonstrate substantial gains over state-of-the-art compression schemes.
📝 Abstract
Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which is often constrained by asymmetric bandwidth limits at the edge. Biased compression techniques are effective in practice, but require error feedback mechanisms to provide theoretical guarantees and to ensure convergence when compression is aggressive. Standard error feedback, however, relies on client-specific control variates, which violates user privacy and is incompatible with stateless clients common in large-scale FL. This paper proposes two novel frameworks that enable biased compression without client-side state or control variates. The first, Compressed Aggregate Feedback (CAFe), uses the globally aggregated update from the previous round as a shared control variate for all clients. The second, Server-Guided Compressed Aggregate Feedback (CAFe-S), extends this idea to scenarios where the server possesses a small private dataset; it generates a server-guided candidate update to be used as a more accurate predictor. We consider Distributed Gradient Descent (DGD) as a representative algorithm and analytically prove CAFe's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-convex regime with bounded gradient dissimilarity. We further prove that CAFe-S converges to a stationary point, with a rate that improves as the server's data become more representative. Experimental results in FL scenarios validate the superiority of our approaches over existing compression schemes.