Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Training billion-parameter universal machine-learned interatomic potentials (uMLIPs) is hindered by the absence of efficient parallel frameworks supporting second-order derivatives and computational-communication bottlenecks arising from model scaling. This work proposes MatRIS-MoE, a mixture-of-experts architecture, together with Janus, a high-dimensional distributed training framework, to enable exascale-efficient parallel training with second-order derivative support for the first time. By integrating hardware-aware communication optimizations, the system achieves 1.2 and 1.0 EFLOPS of single-precision performance—24% and 35.5% of theoretical peak—on two exascale supercomputers, respectively, with parallel efficiency exceeding 90%. This reduces training time from weeks to hours, substantially accelerating the development of foundational AI-for-Science models.

Technology Category

Application Category

📝 Abstract

Universal Machine Learning Interatomic Potentials (uMLIPs), pre-trained on massively diverse datasets encompassing inorganic materials and organic molecules across the entire periodic table, serve as foundational models for quantum-accurate physical simulations. However, uMLIP training requires second-order derivatives, which lack corresponding parallel training frameworks; moreover, scaling to the billion-parameter regime causes explosive growth in computation and communication overhead, making its training a tremendous challenge. We introduce MatRIS-MoE, a billion-parameter Mixture-of-Experts model built upon invariant architecture, and {Janus}, a pioneering high-dimensional distributed training framework for uMLIPs with hardware-aware optimizations. Deployed across two Exascale supercomputers, our code attains a peak performance of 1.2/1.0 EFLOPS (24\%/{35.5\%} of theoretical peak) in single precision at over 90\% parallel efficiency, compressing the training of billion-parameter uMLIPs from weeks to hours. This work establishes a new high-water mark for AI-for-Science (AI4S) foundation models at Exascale and provides essential infrastructure for rapid scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Universal Machine Learning Interatomic Potentials

billion-parameter models

second-order derivatives

distributed training

Exascale computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

distributed training

interatomic potentials

exascale computing

second-order derivatives

🔎 Similar Papers

No similar papers found.