🤖 AI Summary
In cross-institutional federated learning (FL), frequent exchange of model weights between clients and the server is highly susceptible to network congestion, resulting in high synchronization latency and degraded training efficiency. To address this, we propose an SDN-driven dynamic routing framework tailored for FL. Leveraging SDN’s global network visibility, our approach innovatively integrates FL’s inherent communication periodicity and asynchrony, and designs a lightweight, network-state-aware path scheduling algorithm that enables adaptive routing optimization during training. The framework significantly reduces parameter synchronization latency: under a 50-node topology, it achieves 47% and 41% lower synchronization time compared to shortest-path and capacity-aware routing, respectively. It incurs minimal computational overhead, exhibits strong scalability, and demonstrates practical deployability in real-world FL deployments.
📝 Abstract
Cross-silo Federated Learning (FL) enables multiple institutions to collaboratively train machine learning models while preserving data privacy. In such settings, clients repeatedly exchange model weights with a central server, making the overall training time highly sensitive to network performance. However, conventional routing methods often fail to prevent congestion, leading to increased communication latency and prolonged training. Software-Defined Networking (SDN), which provides centralized and programmable control over network resources, offers a promising way to address this limitation. To this end, we propose SmartFLow, an SDN-based framework designed to enhance communication efficiency in cross-silo FL. SmartFLow dynamically adjusts routing paths in response to changing network conditions, thereby reducing congestion and improving synchronization efficiency. Experimental results show that SmartFLow decreases parameter synchronization time by up to 47% compared to shortest-path routing and 41% compared to capacity-aware routing. Furthermore, it achieves these gains with minimal computational overhead and scales effectively to networks of up to 50 clients, demonstrating its practicality for real-world FL deployments.