🤖 AI Summary
Diffusion models for 3D molecular generation face an inherent trade-off between sampling efficiency and conformational accuracy: flow-based models are fast but geometrically imprecise, while denoising diffusion models are accurate yet slow—primarily due to poor coupling between SE(3)-equivariant architectures and diffusion dynamics. This work proposes the VE-annealing diffusion framework, which innovatively models variance-exploding (VE) noise scheduling as a physical annealing process. It introduces an SE(3)-invariant preconditioner that integrates residual diffusion objectives and adopts an arcsin noise schedule to concentrate learning at critical signal-to-noise ratio regimes. On QM9 and GEOM-DRUGS, our method achieves state-of-the-art validity and valency stability using only 100 sampling steps. After GFN2-xTB optimization, the median energy deviation is merely 1.72 kcal/mol—substantially outperforming SemlaFlow (32.3 kcal/mol)—marking the first approach to concurrently attain high sampling efficiency and conformation-level geometric fidelity.
📝 Abstract
Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they often produce geometrically inaccurate structures, as they have difficulty capturing the multimodal distributions of molecular conformations. In contrast, denoising diffusion models are more accurate but suffer from slow sampling, a limitation attributed to sub-optimal integration between diffusion dynamics and SE(3)-equivariant architectures. To address this, we propose VEDA, a unified SE(3)-equivariant framework that combines variance-exploding diffusion with annealing to efficiently generate conformationally accurate 3D molecular structures. Specifically, our key technical contributions include: (1) a VE schedule that enables noise injection functionally analogous to simulated annealing, improving 3D accuracy and reducing relaxation energy; (2) a novel preconditioning scheme that reconciles the coordinate-predicting nature of SE(3)-equivariant networks with a residual-based diffusion objective, and (3) a new arcsin-based scheduler that concentrates sampling in critical intervals of the logarithmic signal-to-noise ratio. On the QM9 and GEOM-DRUGS datasets, VEDA matches the sampling efficiency of flow-based models, achieving state-of-the-art valency stability and validity with only 100 sampling steps. More importantly, VEDA's generated structures are remarkably stable, as measured by their relaxation energy during GFN2-xTB optimization. The median energy change is only 1.72 kcal/mol, significantly lower than the 32.3 kcal/mol from its architectural baseline, SemlaFlow. Our framework demonstrates that principled integration of VE diffusion with SE(3)-equivariant architectures can achieve both high chemical accuracy and computational efficiency.