🤖 AI Summary
Existing molecular generation models exhibit poor generalization to unseen chemical spaces, struggle to simultaneously optimize molecular performance and synthetic accessibility, and suffer from inaccurate out-of-distribution (OOD) property prediction. To address these limitations, we propose a closed-loop active learning framework for molecular generation, integrating a graph neural network–based generator, density functional theory (DFT)-derived quantum-chemical feedback, conditional generation, and an active learning strategy to enable co-iterative optimization of the generator and property predictor. Our approach achieves, for the first time, significant extrapolation in molecular property space (+0.44σ beyond the training distribution), improves OOD classification accuracy by 79%, and generates thermodynamically stable molecules at 3.5× the rate of the second-best baseline—substantially outperforming state-of-the-art methods.
📝 Abstract
Although generative models hold promise for discovering molecules with optimized desired properties, they often fail to suggest synthesizable molecules that improve upon the known molecules seen in training. We find that a key limitation is not in the molecule generation process itself, but in the poor generalization capabilities of molecular property predictors. We tackle this challenge by creating an active-learning, closed-loop molecule generation pipeline, whereby molecular generative models are iteratively refined on feedback from quantum chemical simulations to improve generalization to new chemical space. Compared against other generative model approaches, only our active learning approach generates molecules with properties that extrapolate beyond the training data (reaching up to 0.44 standard deviations beyond the training data range) and out-of-distribution molecule classification accuracy is improved by 79%. By conditioning molecular generation on thermodynamic stability data from the active-learning loop, the proportion of stable molecules generated is 3.5x higher than the next-best model.