Active Learning Enables Extrapolation in Molecular Generative Models

📅 2025-01-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing molecular generation models exhibit poor generalization to unseen chemical spaces, struggle to simultaneously optimize molecular performance and synthetic accessibility, and suffer from inaccurate out-of-distribution (OOD) property prediction. To address these limitations, we propose a closed-loop active learning framework for molecular generation, integrating a graph neural network–based generator, density functional theory (DFT)-derived quantum-chemical feedback, conditional generation, and an active learning strategy to enable co-iterative optimization of the generator and property predictor. Our approach achieves, for the first time, significant extrapolation in molecular property space (+0.44σ beyond the training distribution), improves OOD classification accuracy by 79%, and generates thermodynamically stable molecules at 3.5× the rate of the second-best baseline—substantially outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Although generative models hold promise for discovering molecules with optimized desired properties, they often fail to suggest synthesizable molecules that improve upon the known molecules seen in training. We find that a key limitation is not in the molecule generation process itself, but in the poor generalization capabilities of molecular property predictors. We tackle this challenge by creating an active-learning, closed-loop molecule generation pipeline, whereby molecular generative models are iteratively refined on feedback from quantum chemical simulations to improve generalization to new chemical space. Compared against other generative model approaches, only our active learning approach generates molecules with properties that extrapolate beyond the training data (reaching up to 0.44 standard deviations beyond the training data range) and out-of-distribution molecule classification accuracy is improved by 79%. By conditioning molecular generation on thermodynamic stability data from the active-learning loop, the proportion of stable molecules generated is 3.5x higher than the next-best model.

Problem

Research questions and friction points this paper is trying to address.

Molecular Generation

Predictive Analytics

Synthetic Feasibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active Learning

Molecular Stability

Efficiency Improvement

🔎 Similar Papers

No similar papers found.

Authors to Follow