Follow-Your-Preference: Towards Preference-Aligned Image Inpainting

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses preference alignment in image inpainting. We propose a lightweight, model-agnostic alignment framework that requires no architectural modifications or additional data collection: high-quality preference training data are synthesized from multiple publicly available reward models; systematic biases across brightness, composition, and color dimensions are rigorously characterized; and a simple yet effective multi-reward-model ensemble strategy is introduced to mitigate such biases. Alignment is achieved via direct preference optimization (DPO) applied to off-the-shelf generative models. Our approach consistently outperforms prior methods across standard quantitative metrics, GPT-4V-based evaluation, and human studies—demonstrating substantial improvements in both subjective quality and output consistency of inpainted results. The method establishes a new baseline for preference alignment that is concise, robust, and fully reproducible.

Technology Category

Application Category

📝 Abstract
This paper investigates image inpainting with preference alignment. Instead of introducing a novel method, we go back to basics and revisit fundamental problems in achieving such alignment. We leverage the prominent direct preference optimization approach for alignment training and employ public reward models to construct preference training datasets. Experiments are conducted across nine reward models, two benchmarks, and two baseline models with varying structures and generative algorithms. Our key findings are as follows: (1) Most reward models deliver valid reward scores for constructing preference data, even if some of them are not reliable evaluators. (2) Preference data demonstrates robust trends in both candidate scaling and sample scaling across models and benchmarks. (3) Observable biases in reward models, particularly in brightness, composition, and color scheme, render them susceptible to cause reward hacking. (4) A simple ensemble of these models yields robust and generalizable results by mitigating such biases. Built upon these observations, our alignment models significantly outperform prior models across standard metrics, GPT-4 assessments, and human evaluations, without any changes to model structures or the use of new datasets. We hope our work can set a simple yet solid baseline, pushing this promising frontier. Our code is open-sourced at: https://github.com/shenytzzz/Follow-Your-Preference.
Problem

Research questions and friction points this paper is trying to address.

Aligning image inpainting outputs with human preferences through optimization
Investigating reward model biases and robustness in preference training
Establishing simple ensemble methods to mitigate reward hacking issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging direct preference optimization for alignment training
Employing public reward models to construct datasets
Using model ensemble to mitigate reward biases
🔎 Similar Papers
No similar papers found.
Y
Yutao Shen
The University of Tokyo
Junkun Yuan
Junkun Yuan
Research Scientist, Tencent
Computer VisionMultimodal AIGenerative AI
T
Toru Aonishi
The University of Tokyo
H
Hideki Nakayama
The University of Tokyo
Yue Ma
Yue Ma
Bytedance
NLPDialogue SystemLLM