🤖 AI Summary
To address the low sampling efficiency, excessive step count, and slow inference of diffusion models in small-molecule generation, this paper proposes a graph-structure-driven scheduled diffusion model. Methodologically, it introduces (1) a novel graph-level progressive noise scheduling mechanism enabling dynamic noise injection at subgraph granularity, and (2) the first integration of molecular ring systems as hyper-nodes directly into the graph diffusion process—eliminating the conventional VAE-based reconstruction step. Evaluated on multiple benchmarks, the method maintains generation quality while reducing sampling steps by over 10× and inference time by 50%, achieving a 1.5% improvement in validity. A lightweight compressed variant further boosts validity by 2% and significantly enhances molecular novelty.
📝 Abstract
We introduce a new graph diffusion model for small molecule generation, emph{DMol}, which outperforms the state-of-the-art DiGress model in terms of validity by roughly $1.5%$ across all benchmarking datasets while reducing the number of diffusion steps by at least $10$-fold, and the running time to roughly one half. The performance improvements are a result of a careful change in the objective function and a ``graph noise"scheduling approach which, at each diffusion step, allows one to only change a subset of nodes of varying size in the molecule graph. Another relevant property of the method is that it can be easily combined with junction-tree-like graph representations that arise by compressing a collection of relevant ring structures into supernodes. Unlike classical junction-tree techniques that involve VAEs and require complicated reconstruction steps, compressed DMol directly performs graph diffusion on a graph that compresses only a carefully selected set of frequent carbon rings into supernodes, which results in straightforward sample generation. This compressed DMol method offers additional validity improvements over generic DMol of roughly $2%$, increases the novelty of the method, and further improves the running time due to reductions in the graph size.