Composable Score-based Graph Diffusion Model for Multi-Conditional Molecular Generation

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing controllable molecular graph generation methods suffer from limited performance under multi-condition joint control, primarily due to reliance on joint conditional modeling or continuous relaxations, which compromise structural fidelity. This paper introduces the first score-matching diffusion model explicitly designed for discrete molecular graph spaces, enabling flexible, subset-wise conditional guidance over arbitrary combinations of molecular attributes. Key contributions include: (1) the first extension of score matching to discrete molecular graph spaces; (2) a Composable Guidance (CoG) mechanism for fine-grained, decoupled conditional control; and (3) a Probability Calibration (PC) strategy ensuring training–inference consistency. Evaluated on four standard benchmarks, our method achieves an average 15.3% improvement in controllability while maintaining >98.7% molecular validity and superior distributional fidelity.

Technology Category

Application Category

📝 Abstract
Controllable molecular graph generation is essential for material and drug discovery, where generated molecules must satisfy diverse property constraints. While recent advances in graph diffusion models have improved generation quality, their effectiveness in multi-conditional settings remains limited due to reliance on joint conditioning or continuous relaxations that compromise fidelity. To address these limitations, we propose Composable Score-based Graph Diffusion model (CSGD), the first model that extends score matching to discrete graphs via concrete scores, enabling flexible and principled manipulation of conditional guidance. Building on this foundation, we introduce two score-based techniques: Composable Guidance (CoG), which allows fine-grained control over arbitrary subsets of conditions during sampling, and Probability Calibration (PC), which adjusts estimated transition probabilities to mitigate train-test mismatches. Empirical results on four molecular datasets show that CSGD achieves state-of-the-art performance, with a 15.3% average improvement in controllability over prior methods, while maintaining high validity and distributional fidelity. Our findings highlight the practical advantages of score-based modeling for discrete graph generation and its capacity for flexible, multi-property molecular design.
Problem

Research questions and friction points this paper is trying to address.

Multi-conditional molecular graph generation with property constraints
Addressing limited effectiveness in multi-conditional graph diffusion models
Enabling flexible control over arbitrary condition subsets during sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concrete scores enable discrete graph diffusion
Composable Guidance allows fine-grained condition control
Probability Calibration mitigates train-test mismatches
A
Anjie Qiao
Sun Yat-sen University
Z
Zhen Wang
Sun Yat-sen University
Chuan Chen
Chuan Chen
University of Wisconsin, Madison
Applied Microeconomics
D
DeFu Lian
University of Science and Technology of China
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning