Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

📅 2025-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the substantial gradient bias, low sample efficiency, and poor training stability inherent in reinforcement learning (RL) and truncated backpropagation (BP) for downstream alignment of diffusion models (DMs), this paper proposes a zeroth-order fine-tuning paradigm. At its core lies the novel Recursive Likelihood Ratio (RLR) optimizer, which theoretically guarantees unbiased gradients with low variance. By integrating computational graph reordering and chain-based modeling of the diffusion process, our method enables efficient zeroth-order gradient estimation. We further design RLR-adapted prompting techniques to enhance guidance precision and improve synergy between optimization and generation. Evaluated on image and video generation tasks, our approach consistently outperforms both RL- and truncated BP-based methods—achieving faster convergence, greater training stability, and superior generation quality. This work establishes a new, efficient paradigm for aligning diffusion models with downstream objectives.

Technology Category

Application Category

📝 Abstract
The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous unlabeled data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either based on Reinforcement Learning (RL) or truncated Backpropagation (BP). However, RL and truncated BP suffer from low sample efficiency and biased gradient estimation respectively, resulting in limited improvement or, even worse, complete training failure. To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a zeroth-order informed fine-tuning paradigm for DM. The zeroth-order gradient estimator enables the computation graph rearrangement within the recursive diffusive chain, making the RLR's gradient estimator an unbiased one with the lower variance than other methods. We provide theoretical guarantees for the performance of the RLR. Extensive experiments are conducted on image and video generation tasks to validate the superiority of the RLR. Furthermore, we propose a novel prompt technique that is natural for the RLR to achieve a synergistic effect.
Problem

Research questions and friction points this paper is trying to address.

Probability Diffusion Models
Reinforcement Learning
Truncated Backpropagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive Likelihood Ratio (RLR) Optimizer
Probabilistic Diffusion Models (DM) Fine-tuning
Prompting Technique for Improved Image and Video Generation
🔎 Similar Papers
No similar papers found.
Tao Ren
Tao Ren
Peking University
Foundation modelOptimizationReinforcement Learning
Zishi Zhang
Zishi Zhang
Peking University
simulation optimizationAI
Zehao Li
Zehao Li
Peking University
Operations researchStochastic approximation
J
Jingyang Jiang
Guanghua School of Management, Peking University
S
Shentao Qin
Tsinghua University
Guanghao Li
Guanghao Li
Fudan University
Graphics
Y
Yan Li
The Hong Kong University of Science and Technology
Y
Yi Zheng
Guanghua School of Management, Peking University
X
Xinping Li
School of Economics, Peking University
M
Min Zhan
Hunan University of Technology and Business
Yijie Peng
Yijie Peng
Peking University
SimulationBayesian LearningArtificial IntelligenceHealthcareFinancial Engineering