🤖 AI Summary
This work addresses the critical challenges of gradient staleness and fast-client bias that severely hinder convergence in asynchronous federated learning, where closed-form characterizations of update throughput and staleness have been lacking. By introducing, for the first time, a product-form stochastic queueing network framework, the paper jointly models the stochastic computation and communication delays of clients and the server, yielding closed-form expressions for update throughput, communication round complexity, and wall-clock time under generalized asynchronous SGD. This analysis further uncovers a multi-way trade-off among convergence speed, gradient staleness, and energy efficiency. Building on these insights, the authors propose a gradient-driven routing and concurrency optimization strategy, which reduces convergence time by 29%–46% and energy consumption by 36%–49% compared to AsyncSGD on the EMNIST benchmark.
📝 Abstract
Synchronous federated learning scales poorly due to the straggler effect. Asynchronous algorithms increase the update throughput by processing updates upon arrival, but they introduce two fundamental challenges: gradient staleness, which degrades convergence, and bias toward faster clients under heterogeneous data distributions. Although algorithms such as AsyncSGD and Generalized AsyncSGD mitigate this bias via client-side task queues, most existing analyses neglect the underlying queueing dynamics and lack closed-form characterizations of the update throughput and gradient staleness. To close this gap, we develop a stochastic queueing-network framework for Generalized AsyncSGD that jointly models random computation times at the clients and the central server, as well as random uplink and downlink communication delays. Leveraging product-form network theory, we derive a closed-form expression for the update throughput, alongside closed-form upper bounds for both the communication round complexity and the expected wall-clock time required to reach an $ε$-stationary point. These results formally characterize the trade-off between gradient staleness and wall-clock convergence speed. We further extend the framework to quantify energy consumption under stochastic timing, revealing an additional trade-off between convergence speed and energy efficiency. Building on these analytical results, we propose gradient-based optimization strategies to jointly optimize routing and concurrency. Experiments on EMNIST demonstrate reductions of 29%--46% in convergence time and 36%--49% in energy consumption compared to AsyncSGD.