Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

📅 2024-06-11

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 1

career value

197K/year

🤖 AI Summary

Existing appearance transfer methods struggle to model cross-image semantic correspondences, leading to structural distortions and misaligned colorization. To address this, we propose the first diffusion-based framework that explicitly models semantic alignment—abandoning the self-attention similarity assumption—and achieves structure-appearance disentanglement on unregistered inputs via semantic correspondence estimation and feature rearrangement. Our method fine-tunes SDXL with a CLIP-guided semantic alignment constraint. Experiments demonstrate substantial improvements: a 37% increase in structural preservation rate and a 52% gain in semantic-region colorization accuracy, significantly outperforming state-of-the-art approaches including Palette and ATM. The core contribution lies in the first explicit modeling of cross-image semantic correspondences within diffusion models, establishing a novel paradigm for registration-free appearance transfer.

Technology Category

Application Category

📝 Abstract

As pretrained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. In this paper, we introduce a method to produce results with the same structure of a target image but painted with colors from a reference image, i.e., appearance transfer, especially following the semantic correspondence between the result and the reference. E.g., the result wing takes color from the reference wing, not the reference head. Existing methods rely on the query-key similarity within self-attention layer, usually producing defective results. To this end, we propose to find semantic correspondences and explicitly rearrange the features according to the semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the color from the reference according to the semantic correspondences, even when the two images are not aligned.

Problem

Research questions and friction points this paper is trying to address.

Training-free appearance transfer between images

Establishing semantic correspondence without alignment

Rearranging features to preserve structure and color

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dense semantic correspondences for appearance transfer

Rearranges features explicitly between reference and target

Training-free method preserves structure and color accurately

🔎 Similar Papers

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model