🤖 AI Summary
This study addresses the low efficiency, high redundancy, and insufficient structural diversity inherent in training data construction for machine-learned interatomic potentials (MLIPs). To this end, we propose the Difference Vector–Local Atomic Environment (DV-LAE) method, which introduces histogram-based quantification of local atomic environments via difference vectors—a novel representation enabling both quantitative redundancy assessment and visual identification of structurally informative configurations. DV-LAE integrates local encoding, difference vector construction, histogram-based statistical analysis, and high-dimensional visualization to guide targeted dataset optimization. Applied to hydrogen under high-temperature–high-pressure conditions and the α-Fe/H system, DV-LAE significantly reduces computational cost while preserving prediction accuracy. Crucially, it identifies and incorporates previously underrepresented, physically critical structures—thereby enhancing MLIP generalizability and physical fidelity.
📝 Abstract
This paper addresses the challenges of creating efficient and high-quality datasets for machine learning potential functions. We present a novel approach, termed DV-LAE (Difference Vectors based on Local Atomic Environments), which utilizes the properties of atomic local environments and employs histogram statistics to generate difference vectors. This technique facilitates dataset screening and optimization, effectively minimizing redundancy while maintaining data diversity. We have validated the optimized datasets in high-temperature and high-pressure hydrogen systems as well as the {alpha}-Fe/H binary system, demonstrating a significant reduction in computational resource usage without compromising prediction accuracy. Additionally, our method has revealed new structures that emerge during simulations but were underrepresented in the initial training datasets. The redundancy in the datasets and the distribution of these new structures can be visually analyzed through the visualization of difference vectors. This approach enhances our understanding of the characteristics of these newly formed structures and their impact on physical processes.