🤖 AI Summary
To address the challenges of high statistical heterogeneity, diverse communication topologies, and stringent privacy constraints in cross-organizational federated learning (FL), this paper proposes a generic distributed optimization algorithm grounded in the augmented Lagrangian framework. Methodologically, it integrates proximal relaxation with quadratic approximation techniques, enabling unified convergence analysis for variants including proximal gradient descent and stochastic gradient descent—supporting both centralized and decentralized topologies, asynchronous communication, and non-IID data. Theoretically, it establishes the first general convergence analysis framework compatible with multiple FL architectures and termination criteria. Empirically, the algorithm achieves significantly faster convergence and improved communication efficiency in large-scale, highly heterogeneous settings, while demonstrating robustness and practical applicability.
📝 Abstract
Federated Learning (FL), as a distributed collaborative Machine Learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both centralized and decentralized FL settings. Furthermore, we develop multiple termination criteria and parameter update mechanisms to enhance computational efficiency, accompanied by rigorous theoretical guarantees of convergence. By generalizing the augmented Lagrangian relaxation through the incorporation of proximal relaxation and quadratic approximation, our framework systematically recovers a broad of classical unconstrained optimization methods, including proximal algorithm, classic gradient descent, and stochastic gradient descent, among others. Notably, the convergence properties of these methods can be naturally derived within the proposed theoretical framework. Numerical experiments demonstrate that the proposed algorithm exhibits strong performance in large-scale settings with significant statistical heterogeneity across clients.