Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability bottleneck in distributed graph neural network (GNN) training caused by frequent remote communication across partition boundaries. The authors propose a gradient-only communication framework, wherein each partition independently performs forward and backward passes and exchanges only gradients. To mitigate accuracy degradation, the method integrates periodic graph repartitioning with an unbiased coverage correction mechanism based on importance sampling. Notably, the approach requires neither high-speed interconnects nor neighbor caching and is compatible with both full-graph and mini-batch training paradigms. Experimental results demonstrate an average 4× speedup (up to 13×) over baselines on real-world and synthetic graphs, with superior accuracy—particularly for deep models—enabling efficient trillion-edge-scale GNN training on commodity hardware.

Technology Category

Application Category

📝 Abstract
Cross-partition edges dominate the cost of distributed GNN training: fetching remote features and activations per iteration overwhelms the network as graphs deepen and partition counts grow. Grappa is a distributed GNN training framework that enforces gradient-only communication: during each iteration, partitions train in isolation and exchange only gradients for the global update. To recover accuracy lost to isolation, Grappa (i) periodically repartitions to expose new neighborhoods and (ii) applies a lightweight coverage-corrected gradient aggregation inspired by importance sampling. We prove the corrected estimator is asymptotically unbiased under standard support and boundedness assumptions, and we derive a batch-level variant for compatibility with common deep-learning packages that minimizes mean-squared deviation from the ideal node-level correction. We also introduce a shrinkage version that improves stability in practice. Empirical results on real and synthetic graphs show that Grappa trains GNNs 4 times faster on average (up to 13 times) than state-of-the-art systems, achieves better accuracy especially for deeper models, and sustains training at the trillion-edge scale on commodity hardware. Grappa is model-agnostic, supports full-graph and mini-batch training, and does not rely on high-bandwidth interconnects or caching.
Problem

Research questions and friction points this paper is trying to address.

distributed GNN training
cross-partition edges
communication overhead
scalability
graph neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient-only communication
distributed GNN training
graph repartitioning
coverage-corrected gradient aggregation
scalable graph learning
🔎 Similar Papers
No similar papers found.
C
Chongyang Xu
Max Planck Institute for Software Systems (MPI-SWS), Saarbrücken, Germany
C
Christoph Siebenbrunner
Vienna University of Economics and Business (WU), Vienna, Austria
Laurent Bindschaedler
Laurent Bindschaedler
Research Group Leader, MPI-SWS
Big DataDistributed SystemsMachine LearningCloud ComputingSecurity