π€ AI Summary
To address memory, communication, and computational bottlenecks in federated fine-tuning of large language models (LLMs) on resource-constrained edge devices, this paper proposes Federated Split Perturbation Zeroth-Order Optimization (FedSPZO). FedSPZO is the first method to adaptively allocate zeroth-order gradient estimation perturbations per network module based on architectural characteristics, integrating task-aligned and model-block-wise perturbation strategies. It achieves accelerated convergence with inference-only memory overhead. Compared to state-of-the-art zeroth-order federated methods, FedSPZO reduces computational cost by 2.5β7Γ, enabling efficient and privacy-preserving LLM fine-tuning on low-power edge devices. This work bridges a critical gap between stringent resource constraints and high training performance in edge-based federated learning.
π Abstract
Federated fine-tuning offers a promising approach for tuning Large Language Models (LLMs) on edge devices while preserving data privacy. However, fine-tuning these models on edge devices remains challenging due to high memory, communication, and computational demands. Zero-order optimization with task alignment provides a potential solution, enabling fine-tuning with inference-level memory requirements but requires a longer convergence time. In this paper, we propose Federated Split-Perturbation Zero-order Optimization (FedSPZO) that divides the network into two blocks, applying a different number of perturbations per block in a computationally effective way, achieving faster convergence. Our evaluation shows a $2.5 - 7 imes $ reduction in computation overhead compared to zero-order state of the art techniques in federated learning.