π€ AI Summary
This work addresses the communication inefficiency in hierarchical federated learning caused by uplink bandwidth and latency constraints, which existing single-bit compression methods struggle to accommodate within edge-cloud two-tier architectures. To this end, the authors propose HierSignSGD, a novel framework wherein devices upload only gradient signs, edge servers aggregate via majority voting, and the cloud periodically averages edge models and broadcasts a quantized global model downstream. The study provides the first theoretical analysis of how sign compression, dual-level aggregation intervals, and cross-cluster heterogeneity jointly affect convergence, thereby bridging the theoretical and algorithmic gap of SignSGD in hierarchical settings. Experiments demonstrate that HierSignSGD achieves accuracy comparable to or better than full-precision SGD under both homogeneous and heterogeneous data distributions, with minimal communication overhead, while remaining robust to aggressive downstream sparsification.
π Abstract
Hierarchical federated learning (HFL) has emerged as a key architecture for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication limits, thereby making aggressive gradient compression essential. One-bit methods such as sign-based stochastic gradient descent (SignSGD) offer an attractive solution in flat federated settings, but existing theory and algorithms do not naturally extend to hierarchical settings. In particular, the interaction between majority-vote aggregation at the edge layer and model aggregation at the cloud layer, and its impact on end-to-end performance, remains unknown. To bridge this gap, we propose a highly communication-efficient sign-based HFL framework and develop its corresponding formulation for nonconvex learning, where devices send only signed stochastic gradients, edge servers combine them through majority-vote, and the cloud periodically averages the obtained edge models, while utilizing downlink quantization to broadcast the global model. We introduce the resulting scalable HFL algorithm, HierSignSGD, and provide the convergence analysis for SignSGD in a hierarchical setting. Our core technical contribution is a characterization of how biased sign compression, two-level aggregation intervals, and inter-cluster heterogeneity collectively affect convergence. Numerical experiments under homogeneous and heterogeneous data splits show that HierSignSGD, despite employing extreme compression, achieves accuracy comparable to or better than full-precision stochastic gradient descent while reducing communication cost in the process, and remains robust under aggressive downlink sparsification.