๐ค AI Summary
This work addresses the challenge of bandwidth contention in multi-server federated learning, where overlapping client coverage and uncoordinated client selection often lead to resource conflicts, degrading training efficiency or even causing failure. To mitigate this, the authors propose a decentralized reinforcement learningโdriven client selection mechanism that innovatively integrates a categorical hidden Markov model to predict conflict risk. A fairness-aware reward function is designed to incentivize sustained client participation. Operating without centralized coordination, the proposed approach effectively avoids cross-server resource conflicts, significantly accelerates model convergence, reduces communication overhead, and ensures equitable participation across clients.
๐ Abstract
Federated learning (FL) has emerged as a promising distributed machine learning (ML) that enables collaborative model training across clients without exposing raw data, thereby preserving user privacy and reducing communication costs. Despite these benefits, traditional single-server FL suffers from high communication latency due to the aggregation of models from a large number of clients. While multi-server FL distributes workloads across edge servers, overlapping client coverage and uncoordinated selection often lead to resource contention, causing bandwidth conflicts and training failures. To address these limitations, we propose a decentralized reinforcement learning with conflict risk prediction, named RL CRP, to optimize client selection in multi-server FL systems. Specifically, each server estimates the likelihood of client selection conflicts using a categorical hidden Markov model based on its sparse historical client selection sequence. Then, a fairness-aware reward mechanism is incorporated to promote long-term client participation for minimizing training latency and resource contention. Extensive experiments demonstrate that the proposed RL-CRP framework effectively reduces inter-server conflicts and significantly improves training efficiency in terms of convergence speed and communication cost.