π€ AI Summary
In large-scale edge networks, conventional averaging-based aggregation in gossip learning suffers from poor convergence and severe linear accuracy degradation. To address this, we propose Delta Sum Learningβa mechanism that replaces parameter averaging with weighted summation of model updates (deltas), thereby enhancing global convergence stability and scalability. Additionally, we design a decentralized orchestration framework based on the Open Application Model (OAM), supporting Kubernetes-style declarative deployment, dynamic node discovery, and coordinated execution of heterogeneous workloads. Experimental results demonstrate that, under low-connectivity conditions with 50 nodes, our approach reduces global accuracy loss by 58% compared to baseline methods; moreover, accuracy decay shifts from linear to logarithmic, significantly improving system robustness and edge adaptability.
π Abstract
Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning further decentralizes Federated Learning by removing centralized integration and relying fully on peer to peer updates. However, the averaging methods generally used in both Federated and Gossip Learning are not ideal for model accuracy and global convergence. Additionally, there are few options to deploy Learning workloads in the edge as part of a larger application using a declarative approach such as Kubernetes manifests. This paper proposes Delta Sum Learning as a method to improve the basic aggregation operation in Gossip Learning, and implements it in a decentralized orchestration framework based on Open Application Model, which allows for dynamic node discovery and intent-driven deployment of multi-workload applications. Evaluation results show that Delta Sum performance is on par with alternative integration methods for 10 node topologies, but results in a 58% lower global accuracy drop when scaling to 50 nodes. Overall, it shows strong global convergence and a logarithmic loss of accuracy with increasing topology size compared to a linear loss for alternatives under limited connectivity.