π€ AI Summary
Image personalization in text-to-image diffusion models suffers from concept coupling: due to sparse reference images, models erroneously associate the target subject with irrelevant semantics (e.g., background, style), undermining the trade-off between textual controllability and personalization fidelity. This work is the first to identify two underlying mechanismsβ*cross-concept noise-prediction dependency bias* and *prior-distribution dependency bias*. To address them, we propose two plug-and-play losses: Denoising Decouple Loss and Prior Decouple Loss. These jointly enforce dependency regularization and latent-space statistical independence during fine-tuning, enabling precise decoupling of subject identity from extraneous attributes. Our method preserves strong text control while significantly improving fidelity, achieving state-of-the-art trade-off performance across multiple benchmarks.
π Abstract
Image personalization has garnered attention for its ability to customize Text-to-Image generation using only a few reference images. However, a key challenge in image personalization is the issue of conceptual coupling, where the limited number of reference images leads the model to form unwanted associations between the personalization target and other concepts. Current methods attempt to tackle this issue indirectly, leading to a suboptimal balance between text control and personalization fidelity. In this paper, we take a direct approach to the concept coupling problem through statistical analysis, revealing that it stems from two distinct sources of dependence discrepancies. We therefore propose two complementary plug-and-play loss functions: Denoising Decouple Loss and Prior Decouple loss, each designed to minimize one type of dependence discrepancy. Extensive experiments demonstrate that our approach achieves a superior trade-off between text control and personalization fidelity.