Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

234K/year
🤖 AI Summary
Training billion-parameter universal machine-learned interatomic potentials (uMLIPs) is hindered by the absence of efficient parallel frameworks supporting second-order derivatives and computational-communication bottlenecks arising from model scaling. This work proposes MatRIS-MoE, a mixture-of-experts architecture, together with Janus, a high-dimensional distributed training framework, to enable exascale-efficient parallel training with second-order derivative support for the first time. By integrating hardware-aware communication optimizations, the system achieves 1.2 and 1.0 EFLOPS of single-precision performance—24% and 35.5% of theoretical peak—on two exascale supercomputers, respectively, with parallel efficiency exceeding 90%. This reduces training time from weeks to hours, substantially accelerating the development of foundational AI-for-Science models.

Technology Category

Application Category

📝 Abstract
Universal Machine Learning Interatomic Potentials (uMLIPs), pre-trained on massively diverse datasets encompassing inorganic materials and organic molecules across the entire periodic table, serve as foundational models for quantum-accurate physical simulations. However, uMLIP training requires second-order derivatives, which lack corresponding parallel training frameworks; moreover, scaling to the billion-parameter regime causes explosive growth in computation and communication overhead, making its training a tremendous challenge. We introduce MatRIS-MoE, a billion-parameter Mixture-of-Experts model built upon invariant architecture, and {Janus}, a pioneering high-dimensional distributed training framework for uMLIPs with hardware-aware optimizations. Deployed across two Exascale supercomputers, our code attains a peak performance of 1.2/1.0 EFLOPS (24\%/{35.5\%} of theoretical peak) in single precision at over 90\% parallel efficiency, compressing the training of billion-parameter uMLIPs from weeks to hours. This work establishes a new high-water mark for AI-for-Science (AI4S) foundation models at Exascale and provides essential infrastructure for rapid scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Universal Machine Learning Interatomic Potentials
billion-parameter models
second-order derivatives
distributed training
Exascale computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
distributed training
interatomic potentials
exascale computing
second-order derivatives
🔎 Similar Papers
No similar papers found.
Y
Yuanchang Zhou
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Hongyu Wang
Hongyu Wang
Institute of Computing Technology, Chinese Academy of Sciences
Deep LearningNatural Language ProcessingComputer Vision
Y
Yiming Du
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yan Wang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Mingzhen Li
Mingzhen Li
Institute of Computing Technology, Chinese Academy of Sciences
HPCAI System
Siyu Hu
Siyu Hu
Institute of Computing Technology, Chinese Academy of Sciences
AI4SHPC
X
Xiangyu Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
W
Weijian Liu
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Chen Wang
Chen Wang
Institute of Automation, Chinese Academy of Sciences
Z
Zhuoqiang Guo
Independent Researcher
L
Long Wang
Independent Researcher
J
Jingde Bu
Independent Researcher
Y
Yutong Lu
Sun Yat-Sen University
G
Guangming Tan
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
W
Weile Jia
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences