π€ AI Summary
In multi-objective text-driven image editing, simultaneously achieving semantic alignment and source-image consistency remains challenging. To address this, we propose a decoupling-and-attenuation framework: first, decomposing complex edits into parallel sub-editing tasks; second, under the flow-matching paradigm, orthogonally decomposing the motion velocity field and explicitly attenuating components that disrupt source structure. This novel synergy between edit decoupling and orthogonal velocity attenuation enables high-quality, single-pass multi-objective editing. Our contributions are threefold: (1) We introduce Complex-PIE-Benchβthe first benchmark specifically designed for multi-objective image editing; (2) Our method achieves state-of-the-art performance on both Complex-PIE-Bench and PIE-Bench, significantly outperforming prior approaches; (3) It attains a superior trade-off between semantic accuracy and source-image fidelity, demonstrating robust structural preservation while faithfully realizing diverse textual instructions.
π Abstract
With the surge of pre-trained text-to-image flow matching models, text-based image editing performance has gained remarkable improvement, especially for underline{simple editing} that only contains a single editing target. To satisfy the exploding editing requirements, the underline{complex editing} which contains multiple editing targets has posed as a more challenging task. However, current complex editing solutions: single-round and multi-round editing are limited by long text following and cumulative inconsistency, respectively. Thus, they struggle to strike a balance between semantic alignment and source consistency. In this paper, we propose extbf{FlowDC}, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process. Meanwhile, we observed that the velocity quantity that is orthogonal to the editing displacement harms the source structure preserving. Thus, we decompose the velocity and decay the orthogonal part for better source consistency. To evaluate the effectiveness of complex editing settings, we construct a complex editing benchmark: Complex-PIE-Bench. On two benchmarks, FlowDC shows superior results compared with existing methods. We also detail the ablations of our module designs.