Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work proposes an automated descriptor design methodology that eliminates the need for manual feature engineering to predict the bandgap and ferromagnetic Curie temperature of inorganic materials solely from their chemical formulas. Leveraging an autoresearch paradigm, an AI agent named Automat—powered by OpenAI Codex as its code-generation engine—iteratively generates, implements, and evaluates task-specific descriptors within a random forest framework. This approach achieves, for the first time, fully automated construction of composition-based descriptors that simultaneously exhibit high predictive performance and chemical interpretability. On both prediction tasks, the method significantly outperforms established baselines, including fractional descriptors, Magpie features, and their combinations, thereby overcoming the limitations inherent in conventional handcrafted or template-driven feature engineering strategies.
📝 Abstract
Autoresearch offers a flexible paradigm for automating scientific tasks, in which an AI agent proposes, implements, evaluates, and refines candidate solutions against a quantitative objective. Here, we use composition-based materials-property prediction to test whether such agents can perform a task beyond model selection and hyperparameter optimization: the design of input descriptors. We introduce Automat, an autoresearch framework where a coding agent based on a large language model generates composition-only descriptors for chemical compounds and evaluates them using a random forest workflow. The agent is restricted to information derivable from chemical formulas and iteratively proposes, implements, and tests chemically motivated descriptor strategies. We apply Automat, with OpenAI Codex using GPT-5.5 as the coding agent, to the prediction of experimental band gaps in inorganic materials and Curie temperatures in ferromagnetic compounds. In both tasks, Automat improves over fractional-composition, Magpie, and combined fractional-composition/Magpie baselines, while producing descriptor families that are chemically interpretable. These results provide a demonstration that autoresearch agents can generate competitive, task-specific materials descriptors without manual feature engineering during the run. They also reveal current limitations, including descriptor redundancy, sensitivity to greedy feature expansion, and the need for explicit complexity control, descriptor pruning, and more sophisticated search strategies.
Problem

Research questions and friction points this paper is trying to address.

autoresearch
compositional descriptors
materials property prediction
descriptor design
automated feature engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoresearch
compositional descriptors
AI agent
materials informatics
automated feature engineering
🔎 Similar Papers
2023-11-30Citations: 0