🤖 AI Summary
Poor generalizability and unphysical energy/force predictions plague machine learning interatomic potentials (MLIPs). To address this, we propose a physics-informed weakly supervised training framework that requires only sparse energy labels—eliminating the need for force annotations. Our method introduces a novel dual-physics loss: (i) a Taylor-expansion-based energy extrapolation term ensuring smooth, physically consistent energy surfaces, and (ii) a conservative force constraint enforcing path independence of forces via curl-free regularization. The framework integrates physics-aware embedding, Taylor-series energy approximation, and conservative force optimization, drastically reducing reliance on large-scale pretraining datasets. Evaluated across multiple benchmark datasets, our approach achieves ~50% average reduction in both energy and force prediction errors. It further demonstrates robustness and accuracy in high-precision quantum chemical tasks—including challenging scenarios such as complete basis-set extrapolation where force computation is infeasible.
📝 Abstract
Machine learning plays an increasingly important role in computational chemistry and materials science, complementing computationally intensive ab initio and first-principles methods. Despite their utility, machine-learning models often lack generalization capability and robustness during atomistic simulations, yielding unphysical energy and force predictions that hinder their real-world applications. We address this challenge by introducing a physics-informed, weakly supervised approach for training machine-learned interatomic potentials (MLIPs). We introduce two novel loss functions, extrapolating the potential energy via a Taylor expansion and using the concept of conservative forces. Our approach improves the accuracy of MLIPs applied to training tasks with sparse training data sets and reduces the need for pre-training computationally demanding models with large data sets. Particularly, we perform extensive experiments demonstrating reduced energy and force errors -- often lower by a factor of two -- for various baseline models and benchmark data sets. Finally, we show that our approach facilitates MLIPs' training in a setting where the computation of forces is infeasible at the reference level, such as those employing complete-basis-set extrapolation.