🤖 AI Summary
In federated learning, suboptimal local batch size configurations across clients—particularly when training independently on shared hardware—severely degrade global convergence efficiency. To address this, we propose a cross-device collaborative optimization method that dynamically adapts each client’s local batch size via greedy randomized search, without exchanging raw data or model gradients. Our approach preserves privacy and communication efficiency while maximizing hardware resource utilization. Leveraging inherent parallelism, it jointly optimizes training throughput and system overhead. Extensive experiments demonstrate that our method significantly accelerates global model convergence compared to default fixed batch sizes, achieving performance close to the theoretical upper bound attainable via per-client hyperparameter tuning. Thus, it substantially enhances both the training efficiency and practical deployability of federated learning systems.
📝 Abstract
Federated Learning (FL) is a decentralized collaborative Machine Learning framework for training models without collecting data in a centralized location. It has seen application across various disciplines, from helping medical diagnoses in hospitals to detecting fraud in financial transactions. In this paper, we focus on improving the local training process through hardware usage optimization. While participants in a federation might share the hardware they are training on, since there is no information exchange between them, their training process can be hindered by an improper training configuration. Taking advantage of the parallel processing inherent to Federated Learning, we use a greedy randomized search to optimize local batch sizes for the best training settings across all participants. Our results show that against default parameter settings, our method improves convergence speed while staying nearly on par with the case where local parameters are optimized.