Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models

πŸ“… 2025-09-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the heavy reliance on human expertise in data generation, model training, and validation for machine-learned interatomic potentials (MLIPs), this work introduces AMLP-Analysisβ€”the first end-to-end automated MLIP pipeline integrating large language model (LLM) agents. Built upon the MACE architecture and the Atomic Simulation Environment (ASE), the LLM agent autonomously configures quantum-chemical calculations, invokes computational codes, and parses outputs, enabling full automation from *ab initio* data generation and model training to molecular dynamics (MD) deployment. Applied to acridine polymorphs, the resulting MLIP achieves mean absolute errors of 1.7 meV/atom (energy) and 7.0 meV/Γ… (forces), sub-angstrom structural accuracy, and long-term MD stability under NVE and NVT ensembles. This work pioneers deep LLM integration into a closed-loop MLIP development workflow, substantially reducing dependence on domain experts.

Technology Category

Application Category

πŸ“ Abstract
Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/Γ… in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-Γ… accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.
Problem

Research questions and friction points this paper is trying to address.

Automates MLIP development from dataset creation to validation
Uses LLM agents for electronic-structure code selection and preparation
Achieves quantum-accuracy with foundation model fine-tuning on polymorphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline unifies dataset creation to model validation
Large-language-model agents assist electronic-structure code operations
MACE architecture with fine-tuning achieves near-quantum accuracy
πŸ”Ž Similar Papers
No similar papers found.
A
Adam Lahouari
NYU, Department of Chemistry, New York, NY 10003, USA
Jutta Rogal
Jutta Rogal
Flatiron Institute
enhanced samplingdimensionality reductionmachine learning for molecular physicsmaterials
M
Mark E. Tuckerman
NYU, Department of Chemistry, New York, NY 10003, USA; NYU, Department of Physics, New York, NY 10003, USA; Courant Institute of Mathematical Sciences, NYU, NY 10012, USA; NYU-ECNU Center for Computational Chemistry, Shanghai 200062, China; Simons Center for Computational Physical Chemistry, NYU, NY 10003, USA