Fine-Tuning Diffusion Models for Molecular Generation via Reinforcement Learning and Fast Sampling

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the challenges in structure-based drug design of simultaneously generating drug-like molecules that effectively fit the three-dimensional structure of target proteins, a task hindered by multi-objective optimization difficulties and low sampling efficiency. The authors propose FTDiff, a novel framework that integrates reinforcement learning with a time-independent diffusion model. It employs a GRPO-like strategy to stably fine-tune a pretrained diffusion model and introduces a threshold-aware reward mechanism to balance multiple objectives. Additionally, a fast sampling scheme drastically reduces the number of denoising steps, enhancing both training and inference efficiency. Experimental results demonstrate that FTDiff efficiently generates high-quality molecules with strong validity and diversity on standard benchmarks, outperforming existing methods without requiring post-processing or sophisticated data engineering.
📝 Abstract
Generating molecules that simultaneously satisfy drug-like properties and conform to the 3D structure of a target protein is a core challenge in structure-based drug design (SBDD). Existing generative approaches, however, often rely on costly post-hoc processing during Sampling or require carefully curated datasets during training, yet still achieve modest gains. These limitations are especially pronounced in multi-objective settings, where balancing conflicting criteria remains a core challenge. To address these challenges, We propose FTDiff, a reinforcement learning fine-tuning framework tailored for diffusion-based molecular generation under structural constraints. To ensure stable and sample-efficient optimization, FTDiff adopts a group relative policy optimization (GRPO) style strategy. Furthermore, FTDiff builds upon a time-free pretrained diffusion model and incorporates a fast sampling mechanism that reduces the number of denoising steps, significantly accelerating both training and inference while maintaining generation quality. By optimizing a fixed threshold-aware reward, FTDiff effectively guides the model to produce valid, diverse, and high- quality molecules that balance multiple drug design objectives. Extensive experiments on benchmark datasets demonstrate that FTDiff consistently outperforms prior methods, without requiring expensive post-hoc optimization or intricate data engineering.
Problem

Research questions and friction points this paper is trying to address.

molecular generation
structure-based drug design
multi-objective optimization
3D structure
drug-like properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning fine-tuning
diffusion models
fast sampling
structure-based drug design
multi-objective optimization