๐ค AI Summary
This work addresses the challenges of Split Federated Learning (SFL) in heterogeneous device environments, where fixed model splitting points often overload resource-constrained clients, increase communication overhead, and destabilize training. To overcome these limitations, the authors propose QSplitFL, a framework that constructs lightweight client state representations based on hardware metrics, incorporates a decay-aware loss-reduction reward mechanism, and employs a committee-voting-based deep Q-network (DQN) to dynamically select optimal splitting points. This approach jointly optimizes convergence speed and resource adaptability while mitigating reward manipulation and enhancing generalization. Experimental results on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 demonstrate that QSplitFL significantly outperforms existing baselines, achieving faster convergence, higher accuracy, and improved robustness to device heterogeneity.
๐ Abstract
Federated Learning (FL) combined with Split Learning (SL) is a privacy preserving paradigm that enables training deep neural networks (DNNs) on resource constrained devices while reducing overall training cost. However, determining the optimal split point, meaning the layer where the model is divided still remains a critical challenge, especially when clients have heterogeneous hardware capabilities. Fixed split points can overload weak devices and increase the communication and server load, which slows convergence and reduces stability. This paper introduces QSplitFL, a novel capability-aware Deep Q-Network (DQN) framework for optimal split point selection in Split learning based Federated Learning (SFL) environments. Unlike existing approaches that rely on high-dimensional model weight representations, QSplitFL employs a lightweight state representation derived directly from client hardware metrics, including CPU utilization, memory, battery level, and network latency. The proposed framework incorporates a decayed loss-drop reward function that prioritizes early convergence, and a committee-based DQN architecture with majority voting to mitigate reward hacking. Extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets using CNN, ResNet50, MobileNetV4, and ConvNeXt architectures demonstrate that our approach achieves better convergence and higher accuracy compared to existing methods, while effectively adapting to heterogeneous device resources. The source code is publicly available at https://github.com/AIPO-Lab/QSplitFL.