RewardFlow: Generate Images by Optimizing What You Reward

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently steering pretrained generative models to achieve high-fidelity, semantically aligned image editing and synthesis without requiring fine-tuning or image inversion. To this end, it introduces the first unified multi-reward optimization framework that jointly optimizes multiple objectives—including semantic alignment, perceptual fidelity, spatial localization, object consistency, and human preferences—during inference via multi-reward Langevin dynamics. Key innovations include a differentiable reward mechanism grounded in visual question answering (VQA) and a prompt-aware adaptive weighting strategy that enables fine-grained language-to-vision semantic supervision. The proposed method achieves state-of-the-art performance in both editing fidelity and compositional alignment across multiple image editing and compositional generation benchmarks.
📝 Abstract
We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow unifies complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference, and further introduces a differentiable VQA-based reward that provides fine-grained semantic supervision through language-vision reasoning. To coordinate these heterogeneous objectives, we design a prompt-aware adaptive policy that extracts semantic primitives from the instruction, infers edit intent, and dynamically modulates reward weights and step sizes throughout sampling. Across several image editing and compositional generation benchmarks, RewardFlow delivers state-of-the-art edit fidelity and compositional alignment.
Problem

Research questions and friction points this paper is trying to address.

reward optimization
image generation
semantic alignment
perceptual fidelity
compositional generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

RewardFlow
multi-reward Langevin dynamics
differentiable VQA-based reward
prompt-aware adaptive policy
inversion-free image generation
🔎 Similar Papers
No similar papers found.
O
Onkar Susladkar
University of Illinois Urbana-Champaign
D
Dong-Hwan Jang
University of Illinois Urbana-Champaign
T
Tushar Prakash
Sony Research, India
A
Adheesh Juvekar
University of Illinois Urbana-Champaign
V
Vedant Shah
University of Illinois Urbana-Champaign
A
Ayush Barik
University of Illinois Urbana-Champaign
N
Nabeel Bashir
University of Illinois Urbana-Champaign
Muntasir Wahed
Muntasir Wahed
University of Illinois Urbana-Champaign
Multimodal LearningVision Language ModelsConversational AILarge Language Models
R
Ritish Shrirao
Sony Research, India
Ismini Lourentzou
Ismini Lourentzou
Assistant Professor, University of Illinois Urbana - Champaign
Machine LearningNatural Language ProcessingComputer Vision