Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Inorganic materials synthesis planning has long been constrained by heuristic expertise and the limited generalizability of small-scale, data-driven models. To address this, we pioneer the use of general-purpose large language models (e.g., GPT-4.1, Gemini 2.0 Flash) for zero-shot synthesis condition prediction—directly inferring precursor combinations and thermal treatment temperatures without task-specific fine-tuning. We further propose a novel pretraining paradigm integrating LLM-generated synthetic data with curated literature data, yielding the SyntMTE model. With only minimal labeled data, SyntMTE achieves a zero-shot Top-5 precursor prediction accuracy of 66.1%, and reduces mean absolute error (MAE) in sintering and calcination temperature prediction to 73°C and 98°C, respectively—improving upon baselines by 8.7%. Critically, it successfully reproduces dopant-dependent sintering trends for LLZO electrolytes, markedly enhancing both predictive accuracy and cross-system generalizability in synthesis pathway planning.

Technology Category

Application Category

📝 Abstract
Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 {deg}C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to pretrain a transformer-based model, SyntMTE. After fine-tuning on the combined dataset, SyntMTE reduces mean-absolute error in sintering temperature prediction to 73 {deg}C and in calcination temperature to 98 {deg}C. This strategy improves models by up to 8.7 % compared with baselines trained exclusively on experimental data. Finally, in a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable, data-efficient inorganic synthesis planning.
Problem

Research questions and friction points this paper is trying to address.

Predicting inorganic material synthesis conditions using language models
Enhancing accuracy of calcination and sintering temperature predictions
Generating synthetic reaction recipes for improved data efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language models predict inorganic synthesis conditions
Ensemble models enhance accuracy and reduce costs
Transformer-based SyntMTE improves temperature predictions
🔎 Similar Papers
2024-02-06International Conference on Learning RepresentationsCitations: 67
Thorben Prein
Thorben Prein
Technische Universität München
Materials Informatics
Elton Pan
Elton Pan
PhD Candidate, MIT
generative modelsreinforcement learningmaterials informaticsmaterials synthesis
Janik Jehkul
Janik Jehkul
Technische Universität München
S
Steffen Weinmann
Technische Universit¨at M¨unchen, Garching b. M¨unchen, Germany
E
Elsa A. Olivetti
Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
J
Jennifer L. M. Rupp
Technische Universit¨at M¨unchen, Garching b. M¨unchen, Germany; TUMint. Energy Research GmbH, Munich, Germany