RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high remote feature communication overhead and poor scalability in distributed training of large-scale Graph Neural Networks (GNNs), this paper proposes RapidGNN. Its core innovation is a **deterministic sampling scheduling mechanism** that jointly optimizes cache management and prefetching to enable efficient, localized supply of remote features. By replacing stochastic sampling—whose communication patterns are inherently unpredictable—RapidGNN significantly reduces cross-node data transfers in CPU/GPU heterogeneous environments. Experiments on multiple benchmark datasets demonstrate that RapidGNN achieves 2.46×–3.00× higher end-to-end training throughput, reduces remote feature fetches by 9.70×–15.39×, and improves CPU and GPU energy efficiency by 44% and 32%, respectively. Moreover, it exhibits near-linear scalability with increasing numbers of nodes.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) have become popular across a diverse set of tasks in exploring structural relationships between entities. However, due to the highly connected structure of the datasets, distributed training of GNNs on large-scale graphs poses significant challenges. Traditional sampling-based approaches mitigate the computational loads, yet the communication overhead remains a challenge. This paper presents RapidGNN, a distributed GNN training framework with deterministic sampling-based scheduling to enable efficient cache construction and prefetching of remote features. Evaluation on benchmark graph datasets demonstrates RapidGNN's effectiveness across different scales and topologies. RapidGNN improves end-to-end training throughput by 2.46x to 3.00x on average over baseline methods across the benchmark datasets, while cutting remote feature fetches by over 9.70x to 15.39x. RapidGNN further demonstrates near-linear scalability with an increasing number of computing units efficiently. Furthermore, it achieves increased energy efficiency over the baseline methods for both CPU and GPU by 44% and 32%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Distributed training challenges for large-scale GNNs
Communication overhead in sampling-based GNN approaches
Energy and computational efficiency in GNN training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed GNN training with deterministic sampling-based scheduling
Efficient cache construction and prefetching of remote features
Achieves near-linear scalability and significant energy efficiency improvements
🔎 Similar Papers
No similar papers found.