🤖 AI Summary
Current text-guided image editing methods construct intermediate representations in a target-agnostic manner, prioritizing source image reconstruction while neglecting semantic alignment with the editing objective—leading to low fidelity and inconsistent outputs under substantial modifications. To address this, we propose FlowCycle, the first framework to introduce target-aware intermediate state modeling. It employs learnable, target-conditioned noise parameterization to enable cycle-consistent optimization on pretrained text-to-image flow models—without requiring image inversion. Furthermore, it integrates region-selective destruction with content-preserving strategies to explicitly bridge the semantic gap between the source image and the editing goal. Experiments demonstrate that FlowCycle significantly improves both editing accuracy and source fidelity on complex editing tasks, outperforming existing state-of-the-art methods.
📝 Abstract
Recent advances in pre-trained text-to-image flow models have enabled remarkable progress in text-based image editing. Mainstream approaches always adopt a corruption-then-restoration paradigm, where the source image is first corrupted into an ``intermediate state'' and then restored to the target image under the prompt guidance. However, current methods construct this intermediate state in a target-agnostic manner, i.e., they primarily focus on realizing source image reconstruction while neglecting the semantic gaps towards the specific editing target. This design inherently results in limited editability or inconsistency when the desired modifications substantially deviate from the source. In this paper, we argue that the intermediate state should be target-aware, i.e., selectively corrupting editing-relevant contents while preserving editing-irrelevant ones. To this end, we propose FlowCycle, a novel inversion-free and flow-based editing framework that parameterizes corruption with learnable noises and optimizes them through a cycle-consistent process. By iteratively editing the source to the target and recovering back to the source with dual consistency constraints, FlowCycle learns to produce a target-aware intermediate state, enabling faithful modifications while preserving source consistency. Extensive ablations have demonstrated that FlowCycle achieves superior editing quality and consistency over state-of-the-art methods.