🤖 AI Summary
To address the dual challenges of high uplink bandwidth overhead and gradient privacy leakage in federated learning, this paper proposes the Multi-Projection Directional Derivative (MPDD) compression framework. Clients compute directional derivatives of local gradients along multiple random vectors, enabling nonlinear encoding of high-dimensional gradients into a low-dimensional space; the server reconstructs and aggregates gradients via geometric projection. MPDD is the first to introduce multi-directional derivative projection for gradient compression, overcoming the convergence bottleneck inherent in single-projection methods. We theoretically establish an $O(1/sqrt{K})$ convergence rate—matching that of FedSGD. Moreover, the geometric ambiguity induced by random projections provides intrinsic gradient privacy protection. Communication complexity is reduced from $O(d)$ to $O(m)$, where $m ll d$. Extensive experiments on CIFAR-10/100 and Tiny-ImageNet validate MPDD’s convergence, robustness, and strong resilience against gradient inversion attacks, while enabling flexible privacy–accuracy trade-offs.
📝 Abstract
This paper introduces exttt{FedMPDD} ( extbf{Fed}erated Learning via extbf{M}ulti- extbf{P}rojected extbf{D}irectional extbf{D}erivatives), a novel algorithm that simultaneously optimizes bandwidth utilization and enhances privacy in Federated Learning. The core idea of exttt{FedMPDD} is to encode each client's high-dimensional gradient by computing its directional derivatives along multiple random vectors. This compresses the gradient into a much smaller message, significantly reducing uplink communication costs from $mathcal{O}(d)$ to $mathcal{O}(m)$, where $m ll d$. The server then decodes the aggregated information by projecting it back onto the same random vectors. Our key insight is that averaging multiple projections overcomes the dimension-dependent convergence limitations of a single projection. We provide a rigorous theoretical analysis, establishing that exttt{FedMPDD} converges at a rate of $mathcal{O}(1/sqrt{K})$, matching the performance of FedSGD. Furthermore, we demonstrate that our method provides some inherent privacy against gradient inversion attacks due to the geometric properties of low-rank projections, offering a tunable privacy-utility trade-off controlled by the number of projections. Extensive experiments on benchmark datasets validate our theory and demonstrates our results.