CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing instruction-driven image editing methods struggle to precisely control fine-grained attributes—such as color, spatial position, and object count—and rely on sparse, single-step optimization feedback, lacking trajectory-level control. To address this, we propose a Dense Gradient Flow Optimization framework that, for the first time, backpropagates reward signals continuously along the diffusion denoising trajectory. Our method integrates multimodal large-model-based instruction parsing, dynamic token-focused relocalization, and a novel Dense Group Relative Policy Optimization (GRPO) strategy, enabling end-to-end trajectory-level supervision. Evaluated on multiple benchmarks, our approach achieves state-of-the-art performance, significantly improving fine-grained instruction adherence while preserving high visual fidelity and original image editability.

Technology Category

Application Category

📝 Abstract
Instruction-based image editing with diffusion models has achieved impressive results, yet existing methods strug- gle with fine-grained instructions specifying precise attributes such as colors, positions, and quantities. While recent approaches employ Group Relative Policy Optimization (GRPO) for alignment, they optimize only at individual sampling steps, providing sparse feedback that limits trajectory-level control. We propose a unified framework CogniEdit, combining multi-modal reasoning with dense reward optimization that propagates gradients across con- secutive denoising steps, enabling trajectory-level gradient flow through the sampling process. Our method comprises three components: (1) Multi-modal Large Language Models for decomposing complex instructions into actionable directives, (2) Dynamic Token Focus Relocation that adaptively emphasizes fine-grained attributes, and (3) Dense GRPO-based optimization that propagates gradients across consecutive steps for trajectory-level supervision. Extensive experiments on benchmark datasets demonstrate that our CogniEdit achieves state-of-the-art performance in balancing fine-grained instruction following with visual quality and editability preservation
Problem

Research questions and friction points this paper is trying to address.

Optimizes fine-grained image editing with precise attribute control
Propagates dense gradients across denoising steps for trajectory-level supervision
Decomposes complex instructions using multi-modal reasoning for actionable directives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal reasoning for decomposing complex instructions
Dynamic token focus relocation for fine-grained attributes
Dense gradient flow optimization across denoising steps
🔎 Similar Papers
No similar papers found.
Y
Yan Li
Hongkong University of Science and Technology
L
Lin Liu
Huawei Company
X
Xiaopeng Zhang
Huawei Company
W
Wei Xue
Hongkong University of Science and Technology
Wenhan Luo
Wenhan Luo
Associate Professor, HKUST
Creative AIGenerative ModelComputer VisionMachine Learning
Y
Yike Guo
Hongkong University of Science and Technology
Q
Qi Tian
Huawei Company