Optimal Invariant Bases for Atomistic Machine Learning

๐Ÿ“… 2025-03-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the fundamental trade-off between redundancy and expressiveness in atomic descriptors for machine learning potentials. We propose the Minimal Complete Invariant Basis (MCIB) frameworkโ€”the first to introduce functional dependency analysis from pattern recognition into atomic descriptor design. By constructing Cartesian tensor invariants and rigorously eliminating functionally dependent components, we derive a descriptor subset that is redundancy-free, complete, and compact. We further enhance the Atomic Cluster Expansion (ACE) formalism and design a message-passing network with explicit five-body interaction awareness. Experiments demonstrate that the new descriptors reduce dimensionality by over 40% compared to standard ACE, while maintaining state-of-the-art accuracy on QM9 and MD17 benchmarks (MAE fluctuations < 2%). Moreover, inference speed improves by 2.3ร—, achieving a favorable balance among expressive power, generalization capability, and computational efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
The representation of atomic configurations for machine learning models has led to the development of numerous descriptors, often to describe the local environment of atoms. However, many of these representations are incomplete and/or functionally dependent. Incomplete descriptor sets are unable to represent all meaningful changes in the atomic environment. Complete constructions of atomic environment descriptors, on the other hand, often suffer from a high degree of functional dependence, where some descriptors can be written as functions of the others. These redundant descriptors do not provide additional power to discriminate between different atomic environments and increase the computational burden. By employing techniques from the pattern recognition literature to existing atomistic representations, we remove descriptors that are functions of other descriptors to produce the smallest possible set that satisfies completeness. We apply this in two ways: first we refine an existing description, the Atomistic Cluster Expansion. We show that this yields a more efficient subset of descriptors. Second, we augment an incomplete construction based on a scalar neural network, yielding a new message-passing network architecture that can recognize up to 5-body patterns in each neuron by taking advantage of an optimal set of Cartesian tensor invariants. This architecture shows strong accuracy on state-of-the-art benchmarks while retaining low computational cost. Our results not only yield improved models, but point the way to classes of invariant bases that minimize cost while maximizing expressivity for a host of applications.
Problem

Research questions and friction points this paper is trying to address.

Develops optimal invariant bases for atomic machine learning
Reduces redundancy in atomic environment descriptors
Enhances accuracy and efficiency in atomistic representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimize descriptors using pattern recognition techniques
Refine Atomistic Cluster Expansion for efficiency
Augment neural network with optimal tensor invariants
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Alice E. A. Allen
Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87546; Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545; Max Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
Emily Shinkle
Emily Shinkle
Scientist, Los Alamos National Laboratory
R
Roxana Bujack
Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Nicholas Lubbers
Nicholas Lubbers
Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory