General purpose models for the chemical sciences

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chemical science confronts challenges arising from heterogeneous, sparse, and semantically ambiguous data, rendering conventional machine learning—reliant on large-scale labeled datasets—ineffective for robust modeling. To address this, we propose a novel paradigm of chemical intelligence grounded in General-Purpose Models (GPMs), integrating large language models, self-supervised learning, transfer learning, and in-context learning to enable end-to-end reasoning across molecular design, reaction prediction, and other tasks—even under low-shot or zero-shot conditions. Our framework unifies multimodal, heterogeneous chemical data representations and substantially reduces dependence on human-annotated labels. We validate its prototype capabilities across multiple scientific workflows, including property prediction, retrosynthetic planning, and reaction outcome forecasting. Furthermore, we systematically survey state-of-the-art GPM applications in chemistry, establishing both a methodological foundation and a technical roadmap for advancing chemical AI from narrow, task-specific models toward broad, general-purpose intelligence.

Technology Category

Application Category

📝 Abstract
Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tasks they have not been directly trained on, and to flexibly operate with low amounts of data in different formats. In this review, we discuss fundamental building principles of GPMs and review recent applications of those models in the chemical sciences across the entire scientific process. While many of these applications are still in the prototype phase, we expect that the increasing interest in GPMs will make many of them mature in the coming years.
Problem

Research questions and friction points this paper is trying to address.

Addressing diverse small fuzzy chemical datasets
Leveraging general-purpose models for chemical tasks
Exploring GPM applications across chemical sciences
Innovation

Methods, ideas, or system contributions that make the work stand out.

General-purpose models for diverse chemical datasets
Leveraging large language models in chemistry
Flexible operation with low data amounts
🔎 Similar Papers
No similar papers found.
Nawaf Alampara
Nawaf Alampara
PhD Researcher, Friedrich Schiller University Jena
machine learningai4scienceaccelerating researchcomputational material science
A
Anagha Aneesh
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany
Martiño Ríos-García
Martiño Ríos-García
Friedrich-Schiller-Universität Jena
digital chemistrymachine learningai agents
Adrian Mirza
Adrian Mirza
PhD researcher, FSU Jena & HIPOLE Jena
Machine LearningComputational ChemistryData Science
Mara Schilling-Wilhelmi
Mara Schilling-Wilhelmi
Friedrich-Schiller-Universität Jena
Polymer ChemistryMachine Learning
A
Ali Asghar Aghajani
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany
M
Meiling Sun
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany
G
Gordan Prastalo
Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany; Helmholtz Institute for Polymers in Energy Applications Jena (HIPOLE Jena), Lessingstrasse 12-14, 07743 Jena, Germany
Kevin Maik Jablonka
Kevin Maik Jablonka
FSU Jena & HIPOLE Jena
digital chemistryAI for sciencemodel evaluations