Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

πŸ“… 2025-07-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing diffusion models (DMs) rely on computationally intensive reward modeling and optimization for post-training alignment with human preferences, resulting in high computational cost, low accuracy, and poor efficiency. This paper proposes Inversion-DPOβ€”the first reward-free alignment framework that integrates DDIM inversion with direct preference optimization (DPO). By applying deterministic DDIM inversion to preference-labeled win/loss image pairs, our method recovers latent noise variables and constructs contrastive learning objectives, thereby bypassing both reward modeling and approximate inference. Inversion-DPO establishes an efficient, deterministic post-training paradigm. It significantly outperforms state-of-the-art methods on text-to-image and compositional generation tasks, yielding superior image fidelity and structural consistency. To support future research, we publicly release a benchmark dataset comprising 11,000 high-quality preference-aligned image pairs.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model and a reward model, which not only incurs substantial computational overhead but may also compromise model accuracy and training efficiency. To address these limitations, we propose Inversion-DPO, a novel alignment framework that circumvents reward modeling by reformulating Direct Preference Optimization (DPO) with DDIM inversion for DMs. Our method conducts intractable posterior sampling in Diffusion-DPO with the deterministic inversion from winning and losing samples to noise and thus derive a new post-training paradigm. This paradigm eliminates the need for auxiliary reward models or inaccurate appromixation, significantly enhancing both precision and efficiency of training. We apply Inversion-DPO to a basic task of text-to-image generation and a challenging task of compositional image generation. Extensive experiments show substantial performance improvements achieved by Inversion-DPO compared to existing post-training methods and highlight the ability of the trained generative models to generate high-fidelity compositionally coherent images. For the post-training of compostitional image geneation, we curate a paired dataset consisting of 11,140 images with complex structural annotations and comprehensive scores, designed to enhance the compositional capabilities of generative models. Inversion-DPO explores a new avenue for efficient, high-precision alignment in diffusion models, advancing their applicability to complex realistic generation tasks. Our code is available at https://github.com/MIGHTYEZ/Inversion-DPO
Problem

Research questions and friction points this paper is trying to address.

Eliminates need for reward models in diffusion alignment
Improves training precision and efficiency in DMs
Enables high-fidelity compositional image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates DPO with DDIM inversion
Eliminates need for reward models
Enhances precision and efficiency
πŸ”Ž Similar Papers
No similar papers found.