Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the scalability bottleneck in distributed graph neural network (GNN) training caused by frequent remote communication across partition boundaries. The authors propose a gradient-only communication framework, wherein each partition independently performs forward and backward passes and exchanges only gradients. To mitigate accuracy degradation, the method integrates periodic graph repartitioning with an unbiased coverage correction mechanism based on importance sampling. Notably, the approach requires neither high-speed interconnects nor neighbor caching and is compatible with both full-graph and mini-batch training paradigms. Experimental results demonstrate an average 4× speedup (up to 13×) over baselines on real-world and synthetic graphs, with superior accuracy—particularly for deep models—enabling efficient trillion-edge-scale GNN training on commodity hardware.

Technology Category

Application Category

📝 Abstract

Cross-partition edges dominate the cost of distributed GNN training: fetching remote features and activations per iteration overwhelms the network as graphs deepen and partition counts grow. Grappa is a distributed GNN training framework that enforces gradient-only communication: during each iteration, partitions train in isolation and exchange only gradients for the global update. To recover accuracy lost to isolation, Grappa (i) periodically repartitions to expose new neighborhoods and (ii) applies a lightweight coverage-corrected gradient aggregation inspired by importance sampling. We prove the corrected estimator is asymptotically unbiased under standard support and boundedness assumptions, and we derive a batch-level variant for compatibility with common deep-learning packages that minimizes mean-squared deviation from the ideal node-level correction. We also introduce a shrinkage version that improves stability in practice. Empirical results on real and synthetic graphs show that Grappa trains GNNs 4 times faster on average (up to 13 times) than state-of-the-art systems, achieves better accuracy especially for deeper models, and sustains training at the trillion-edge scale on commodity hardware. Grappa is model-agnostic, supports full-graph and mini-batch training, and does not rely on high-bandwidth interconnects or caching.

Problem

Research questions and friction points this paper is trying to address.

distributed GNN training

cross-partition edges

communication overhead

scalability

graph neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient-only communication

distributed GNN training

graph repartitioning