Sampling-based Distributed Training with Message Passing Neural Network

📅 2024-02-23
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Scalability of edge-level message-passing neural networks (MPNNs) is severely limited on ultra-large graphs (≥10⁵ nodes) due to explosive edge growth and GPU memory constraints. To address this, we propose DS-MPNN—a distributed MPNN training framework integrating domain decomposition and Nyström sampling. DS-MPNN is the first to incorporate Nyström low-rank approximation into edge-level message passing, enabling memory-efficient distributed data parallelism without compromising modeling granularity. By partitioning the graph into spatially coherent subdomains and approximating global edge interactions via Nyström sampling, DS-MPNN decouples computation from full-graph storage. Evaluated on Darcy flow and 2D airfoil RANS simulation tasks, DS-MPNN matches the accuracy of monolithic single-GPU MPNNs while scaling to over 10× more nodes. It significantly outperforms node-level baselines (e.g., GCN) in both fidelity and scalability. DS-MPNN establishes a high-fidelity, computationally efficient paradigm for distributed edge-level learning on large-scale physical graphs.

Technology Category

Application Category

📝 Abstract
In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN). Our objective is to address the challenge of scaling edge-based graph neural networks as the number of nodes increases. Through our distributed training approach, coupled with Nystr""om-approximation sampling techniques, we present a scalable graph neural network, referred to as DS-MPNN (D and S standing for distributed and sampled, respectively), capable of scaling up to $O(10^5)$ nodes. We validate our sampling and distributed training approach on two cases: (a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils, providing comparisons with both single-GPU implementation and node-based graph convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy to single-GPU implementation, can accommodate a significantly larger number of nodes compared to the single-GPU variant (S-MPNN), and significantly outperforms the node-based GCN.
Problem

Research questions and friction points this paper is trying to address.

Scalability of edge-based graph neural networks
Distributed training for large node counts
Nyström-approximation sampling in neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-decomposition-based distributed training
Nyström-approximation sampling techniques
Scalable DS-MPNN for large graphs
🔎 Similar Papers
No similar papers found.
P
P. Kakka
University of Notre Dame, USA
Sheel Nidhan
Sheel Nidhan
Ansys, Inc., USA
Rishikesh Ranade
Rishikesh Ranade
Senior Engineer - Physics ML, NVIDIA
Machine LearningComputational Science
J
J. MacArt
University of Notre Dame, USA