Fine-Tuning Diffusion Models for Molecular Generation via Reinforcement Learning and Fast Sampling

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenges in structure-based drug design of simultaneously generating drug-like molecules that effectively fit the three-dimensional structure of target proteins, a task hindered by multi-objective optimization difficulties and low sampling efficiency. The authors propose FTDiff, a novel framework that integrates reinforcement learning with a time-independent diffusion model. It employs a GRPO-like strategy to stably fine-tune a pretrained diffusion model and introduces a threshold-aware reward mechanism to balance multiple objectives. Additionally, a fast sampling scheme drastically reduces the number of denoising steps, enhancing both training and inference efficiency. Experimental results demonstrate that FTDiff efficiently generates high-quality molecules with strong validity and diversity on standard benchmarks, outperforming existing methods without requiring post-processing or sophisticated data engineering.

📝 Abstract

Generating molecules that simultaneously satisfy drug-like properties and conform to the 3D structure of a target protein is a core challenge in structure-based drug design (SBDD). Existing generative approaches, however, often rely on costly post-hoc processing during Sampling or require carefully curated datasets during training, yet still achieve modest gains. These limitations are especially pronounced in multi-objective settings, where balancing conflicting criteria remains a core challenge. To address these challenges, We propose FTDiff, a reinforcement learning fine-tuning framework tailored for diffusion-based molecular generation under structural constraints. To ensure stable and sample-efficient optimization, FTDiff adopts a group relative policy optimization (GRPO) style strategy. Furthermore, FTDiff builds upon a time-free pretrained diffusion model and incorporates a fast sampling mechanism that reduces the number of denoising steps, significantly accelerating both training and inference while maintaining generation quality. By optimizing a fixed threshold-aware reward, FTDiff effectively guides the model to produce valid, diverse, and high- quality molecules that balance multiple drug design objectives. Extensive experiments on benchmark datasets demonstrate that FTDiff consistently outperforms prior methods, without requiring expensive post-hoc optimization or intricate data engineering.

Problem

Research questions and friction points this paper is trying to address.

molecular generation

structure-based drug design

multi-objective optimization

3D structure

drug-like properties

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning fine-tuning

diffusion models

fast sampling