MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning

πŸ“… 2025-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Molecular editing requires optimizing target properties while preserving structural similarity; however, existing string- or continuous-representation-based methods fail to adequately model molecules’ inherent discrete graph structure, resulting in low structural fidelity and poor edit controllability. To address this, we propose the first two-stage framework integrating discrete graph diffusion pretraining with graph-constrained reinforcement learning: (1) a discrete graph diffusion model learns molecular graph structural priors; (2) an edit-aware, instruction-conditioned RL agent performs graph-structure-constrained action selection guided by natural-language instructions. We introduce MolEdit-Instruct, a large-scale multi-attribute molecular editing instruction dataset comprising 3 million samples. Experiments demonstrate that our method achieves a 74% improvement in edit success rate, reduces parameter count by 98%, and outperforms state-of-the-art methods across both property optimization accuracy and structural similarity.

Technology Category

Application Category

πŸ“ Abstract
Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74% improvement in editing success rate while using 98% fewer parameters.
Problem

Research questions and friction points this paper is trying to address.

Optimize molecular properties while preserving structural similarity
Overcome limitations of string-based or continuous molecular representations
Enhance property alignment and structural preservation via reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete graph diffusion for molecular reconstruction
Reinforcement learning for property alignment
Graph constraints for structural preservation
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuanxin Zhuang
Artificial Intelligence Thrust, Hong Kong University of Science and Technology (Guangzhou)
Dazhong Shen
Dazhong Shen
Nanjing University of Aeronautics and Astronautics
Data MiningGenerative AI
Y
Ying Sun
Artificial Intelligence Thrust, Hong Kong University of Science and Technology (Guangzhou)