AI-Driven Expansion and Application of the Alexandria Database

📅 2025-12-09
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Efficient identification and expansion of thermodynamically stable inorganic compounds remains a key challenge in materials discovery. Method: We propose a multi-stage, AI-driven materials discovery paradigm integrating the Matra-Genoa generative model, Orb-v2 interatomic potential, and ALIGNN graph neural network to establish an end-to-end generation–screening–validation pipeline; we publicly release the sAlex25 dataset and a GRACE force-field fine-tuning protocol. Contributions/Results: Our approach achieves 99% accuracy in identifying near-stable compounds, reproduces experimentally observed structural disorder rates (37–43%)—a first for AI-generated data—and uncovers novel physical insights, including sublinear scaling of convex hull connectivity. It yields 1.3 million DFT-validated structures, including 74,000 newly predicted stable materials, expanding the ALEXANDRIA database to 5.8 million structures and 175,000 convex-hull compounds, thereby substantially improving benchmark performance.

Technology Category

Application Category

📝 Abstract
We present a novel multi-stage workflow for computational materials discovery that achieves a 99% success rate in identifying compounds within 100 meV/atom of thermodynamic stability, with a threefold improvement over previous approaches. By combining the Matra-Genoa generative model, Orb-v2 universal machine learning interatomic potential, and ALIGNN graph neural network for energy prediction, we generated 119 million candidate structures and added 1.3 million DFT-validated compounds to the ALEXANDRIA database, including 74 thousand new stable materials. The expanded ALEXANDRIA database now contains 5.8 million structures with 175 thousand compounds on the convex hull. Predicted structural disorder rates (37-43%) match experimental databases, unlike other recent AI-generated datasets. Analysis reveals fundamental patterns in space group distributions, coordination environments, and phase stability networks, including sub-linear scaling of convex hull connectivity. We release the complete dataset, including sAlex25 with 14 million out-of-equilibrium structures containing forces and stresses for training universal force fields. We demonstrate that fine-tuning a GRACE model on this data improves benchmark accuracy. All data, models, and workflows are freely available under Creative Commons licenses.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-stage AI workflow for high-accuracy materials discovery
Expands a materials database with millions of DFT-validated stable compounds
Analyzes structural and stability patterns to improve predictive models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage AI workflow for materials discovery
Combines generative models and neural networks
Expands database with DFT-validated stable compounds
🔎 Similar Papers
No similar papers found.
T
ThĂŠo Cavignac
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany
J
Jonathan Schmidt
Department of Materials, ETH ZĂźrich, ZĂźrich, CH-8093, Switzerland
P
Pierre-Paul De Breuck
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany
A
Antoine Loew
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany
T
Tiago F. T. Cerqueira
CFisUC, Department of Physics, University of Coimbra, Rua Larga, 3004-516 Coimbra, Portugal
H
Hai-Chen Wang
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany
A
Anton Bochkarev
ICAMS, Ruhr-Universität Bochum, Universitätstrasse 150, 44801 Bochum, Germany and ACEworks GmbH, Hagen-Hof-Weg 1, 44797 Bochum, Germany
Yury Lysogorskiy
Yury Lysogorskiy
ICAMS, Ruhr University Bochum
interatomic potentialsmachine learning
A
Aldo H. Romero
Department of Physics, West Virginia University, Morgantown, WV 26506, USA
Ralf Drautz
Ralf Drautz
ICAMS, Ruhr-Universität Bochum, Universitätstrasse 150, 44801 Bochum, Germany and ACEworks GmbH, Hagen-Hof-Weg 1, 44797 Bochum, Germany
S
Silvana Botti
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany
M
Miguel A. L. Marques
Research Center Future Energy Materials and Systems of the University Alliance Ruhr and ICAMS, Ruhr University Bochum, Universitätsstraße 150, D-44801 Bochum, Germany