Training Transformers for Mesh-Based Simulations

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the poor scalability and inefficiency of Graph Neural Network (GNN) message-passing architectures on large-scale, complex meshes in grid-based physics simulation, this paper proposes the first Graph Transformer architecture tailored for ultra-large meshes (up to 300K nodes and 3M edges). Our method innovatively employs the adjacency matrix as a sparse attention mask and integrates dilated sliding-window attention with global attention to enable multi-scale local–global feature modeling. We further introduce K-hop positional encoding and end-to-end training on high-fidelity computational fluid dynamics (CFD) data. Experiments demonstrate that our smallest model matches MeshGraphNet’s accuracy while achieving 7× faster inference and 6× fewer parameters; the largest model reduces full-rollout RMSE by 52% over the state-of-the-art and improves average performance by 38.8%.

Technology Category

Application Category

📝 Abstract

Simulating physics using Graph Neural Networks (GNNs) is predominantly driven by message-passing architectures, which face challenges in scaling and efficiency, particularly in handling large, complex meshes. These architectures have inspired numerous enhancements, including multigrid approaches and $K$-hop aggregation (using neighbours of distance $K$), yet they often introduce significant complexity and suffer from limited in-depth investigations. In response to these challenges, we propose a novel Graph Transformer architecture that leverages the adjacency matrix as an attention mask. The proposed approach incorporates innovative augmentations, including Dilated Sliding Windows and Global Attention, to extend receptive fields without sacrificing computational efficiency. Through extensive experimentation, we evaluate model size, adjacency matrix augmentations, positional encoding and $K$-hop configurations using challenging 3D computational fluid dynamics (CFD) datasets. We also train over 60 models to find a scaling law between training FLOPs and parameters. The introduced models demonstrate remarkable scalability, performing on meshes with up to 300k nodes and 3 million edges. Notably, the smallest model achieves parity with MeshGraphNet while being $7 imes$ faster and $6 imes$ smaller. The largest model surpasses the previous state-of-the-art by $38.8$% on average and outperforms MeshGraphNet by $52$% on the all-rollout RMSE, while having a similar training speed. Code and datasets are available at https://github.com/DonsetPG/graph-physics.

Problem

Research questions and friction points this paper is trying to address.

Addressing scaling and efficiency challenges in GNN-based physics simulations

Proposing a Graph Transformer using adjacency matrix as attention mask

Enhancing performance on large complex meshes with innovative augmentations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Transformer with adjacency matrix attention mask

Dilated Sliding Windows and Global Attention augmentations

Scalable architecture handling 300k nodes efficiently

🔎 Similar Papers

Universal Physics Transformers