Scaling Laws of Graph Neural Networks for Atomistic Materials Modeling

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work investigates the scalability limits of Graph Neural Networks (GNNs) for atomic-scale materials modeling. To characterize the interplay among model capacity, dataset size, and prediction accuracy, we establish the first systematic scaling law for atomic GNNs. Leveraging large language model (LLM) training techniques—including distributed optimization, sparse attention, gradient checkpointing, and mixed-precision arithmetic—we adapt them to GNNs and develop the first billion-parameter, terabyte-scale data-driven foundational model for materials science. Empirical analysis reveals power-law scaling of GNN performance with respect to both parameter count and training data volume. We open-source a high-performance codebase enabling efficient inference on systems comprising up to 10 billion atoms. On standard benchmarks—including QM9 and Materials Project—our model achieves state-of-the-art generalization accuracy, improving mean absolute error by 12–18% over prior methods.

Technology Category

Application Category

📝 Abstract

Atomistic materials modeling is a critical task with wide-ranging applications, from drug discovery to materials science, where accurate predictions of the target material property can lead to significant advancements in scientific discovery. Graph Neural Networks (GNNs) represent the state-of-the-art approach for modeling atomistic material data thanks to their capacity to capture complex relational structures. While machine learning performance has historically improved with larger models and datasets, GNNs for atomistic materials modeling remain relatively small compared to large language models (LLMs), which leverage billions of parameters and terabyte-scale datasets to achieve remarkable performance in their respective domains. To address this gap, we explore the scaling limits of GNNs for atomistic materials modeling by developing a foundational model with billions of parameters, trained on extensive datasets in terabyte-scale. Our approach incorporates techniques from LLM libraries to efficiently manage large-scale data and models, enabling both effective training and deployment of these large-scale GNN models. This work addresses three fundamental questions in scaling GNNs: the potential for scaling GNN model architectures, the effect of dataset size on model accuracy, and the applicability of LLM-inspired techniques to GNN architectures. Specifically, the outcomes of this study include (1) insights into the scaling laws for GNNs, highlighting the relationship between model size, dataset volume, and accuracy, (2) a foundational GNN model optimized for atomistic materials modeling, and (3) a GNN codebase enhanced with advanced LLM-based training techniques. Our findings lay the groundwork for large-scale GNNs with billions of parameters and terabyte-scale datasets, establishing a scalable pathway for future advancements in atomistic materials modeling.

Problem

Research questions and friction points this paper is trying to address.

Exploring scaling limits of GNNs for atomistic materials modeling

Investigating impact of dataset size on GNN accuracy

Applying LLM-inspired techniques to GNN architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed billion-parameter GNN for materials modeling

Used LLM techniques for large-scale data management

Explored scaling laws linking model size and accuracy

🔎 Similar Papers

No similar papers found.