🤖 AI Summary
This work addresses the limited out-of-distribution (OOD) generalization of existing statistical filters for molecular synthesizability, which struggle to reliably evaluate novel molecules proposed by generative models. The authors propose a multitask learning framework that integrates closed-form physical priors—specifically, the Bertz topological complexity index and MMFF94 force field strain energy—as auxiliary supervision signals into a GINE backbone network. By jointly regressing topological complexity and applying soft constraints on strain energy, the model enhances OOD generalization. Evaluated on the COCONUT natural products OOD test set, the approach achieves a statistically significant AUC improvement of 0.0066 (95% CI [+0.0038, +0.0093]) while maintaining stable in-distribution performance, demonstrating the critical role of physical priors and multi-subset evaluation in ensuring robust conclusions.
📝 Abstract
Machine-learning drug-discovery pipelines increasingly rely on generative models that propose molecules far from the data used to train downstream synthesizability filters. Existing
filters (SAScore, SCScore, RAscore, DeepSA) are purely statistical and degrade in exactly this out-of-distribution (OOD) regime. We ask whether cheap, closed-form physical priors, used
as auxiliary supervision on a graph neural network (GNN), improve OOD generalization. We add two auxiliary losses to a GINE backbone: a topological complexity regression supervised by
the Bertz index, and a strain-energy soft penalty supervised by MMFF94 force-field energy. On a 65,177-molecule corpus (HIV, Tox21, COCONUT) labeled by SAScore thresholds we reproduce
a strong in-distribution baseline, then evaluate a 4-way ablation (baseline / +complexity / +strain / +both) on a single-source OOD split (train on drug-like HIV+Tox21, test on
COCONUT natural products), repeated over 5 seeds with paired bootstrap confidence intervals. All three physics-aware variants give a small but statistically significant OOD improvement
over the baseline (mean OOD AUC 0.9774): +complexity Delta = +0.0060 (95% CI [+0.0023, +0.0102]), +strain Delta = +0.0032 ([+0.0008, +0.0052]), +both Delta = +0.0066 ([+0.0038,
+0.0093]); every interval excludes zero, and the combination is best. The variants are indistinguishable in-distribution, so the effect is visible only under OOD evaluation. We are
explicit that the effects are modest, and we report a cautionary methodological finding: a single-seed version of this experiment produced a qualitatively different (non-monotone)
story that did not survive multi-seed evaluation.