🤖 AI Summary
In heterogeneous-data nonconvex composite federated learning, existing methods suffer from high communication overhead, severe client drift, and strong coupling between proximal computation and communication. Method: We propose a decoupled sparse communication framework that separates local proximal operator computation from client-server communication, transmitting only a single $d$-dimensional vector per round. To our knowledge, this is the first such decoupling in the nonconvex nonsmooth composite setting; we design an algorithm based on distributed stochastic proximal gradient descent. Contribution/Results: We establish sublinear convergence under general conditions and linear convergence to a bounded residual under the proximal-PL condition. Experiments on synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines—reducing communication cost, improving convergence stability, and effectively mitigating client drift.
📝 Abstract
We propose an innovative algorithm for non-convex composite federated learning that decouples the proximal operator evaluation and the communication between server and clients. Moreover, each client uses local updates to communicate less frequently with the server, sends only a single d-dimensional vector per communication round, and overcomes issues with client drift. In the analysis, challenges arise from the use of decoupling strategies and local updates in the algorithm, as well as from the non-convex and non-smooth nature of the problem. We establish sublinear and linear convergence to a bounded residual error under general non-convexity and the proximal Polyak-Lojasiewicz inequality, respectively. In the numerical experiments, we demonstrate the superiority of our algorithm over state-of-the-art methods on both synthetic and real datasets.