๐ค AI Summary
Existing equivariant graph neural networks (EGNNs) suffer from two key bottlenecks on large-scale geometric graphs: low computational efficiency and severe performance degradation under sparsification. To address these, we propose FastEGNN and DistEGNN. FastEGNN approximates unordered large graphs via an ordered set of virtual nodes and introduces a differentiated message-passing mechanism to enhance modeling fidelity. DistEGNN leverages virtual nodes as global bridges across subgraphs in a distributed architecture and enforces global distributional consistency via maximum mean discrepancy (MMD) minimization. Both methods rigorously preserve SE(3)-equivariance while significantly improving scalability. Experiments on N-body, protein dynamics, Water-3D, and our newly constructed large-scale Fluid113K dataset (113K nodes) demonstrate substantial gains over state-of-the-art EGNNsโachieving simultaneous breakthroughs in training speed and prediction accuracy. To our knowledge, this is the first framework enabling scalable, equivariant learning on ultra-large geometric graphs.
๐ Abstract
Equivariant Graph Neural Networks (GNNs) have achieved remarkable success across diverse scientific applications. However, existing approaches face critical efficiency challenges when scaling to large geometric graphs and suffer significant performance degradation when the input graphs are sparsified for computational tractability. To address these limitations, we introduce FastEGNN and DistEGNN, two novel enhancements to equivariant GNNs for large-scale geometric graphs. FastEGNN employs a key innovation: a small ordered set of virtual nodes that effectively approximates the large unordered graph of real nodes. Specifically, we implement distinct message passing and aggregation mechanisms for different virtual nodes to ensure mutual distinctiveness, and minimize Maximum Mean Discrepancy (MMD) between virtual and real coordinates to achieve global distributedness. This design enables FastEGNN to maintain high accuracy while efficiently processing large-scale sparse graphs. For extremely large-scale geometric graphs, we present DistEGNN, a distributed extension where virtual nodes act as global bridges between subgraphs in different devices, maintaining consistency while dramatically reducing memory and computational overhead. We comprehensively evaluate our models across four challenging domains: N-body systems (100 nodes), protein dynamics (800 nodes), Water-3D (8,000 nodes), and our new Fluid113K benchmark (113,000 nodes). Results demonstrate superior efficiency and performance, establishing new capabilities in large-scale equivariant graph learning. Code is available at https://github.com/GLAD-RUC/DistEGNN.