🤖 AI Summary
To address core challenges in personalized avatar generation—namely, identity preservation degradation and inconsistent viewpoint and illumination—we propose an end-to-end disentangled framework that decomposes generation into two subtasks: illumination-aware stitching and viewpoint-consistent adaptation. Methodologically, we introduce a novel context-aware dense correspondence matching mechanism built upon pretrained diffusion models; design a high-ratio masked self-supervised strategy to learn robust illumination representations; and construct a synthetic viewpoint-consistent silhouette dataset to enable 3D-aware spatial alignment control. Our approach integrates diffusion modeling, latent-space ControlNet-style stitching supervision, in-context correspondence learning, and explicit illumination-deformation disentanglement. Experiments demonstrate state-of-the-art performance across identity fidelity, generation stability, and 3D-aware relighting capability.
📝 Abstract
Existing diffusion models show great potential for identity-preserving generation. However, personalized portrait generation remains challenging due to the diversity in user profiles, including variations in appearance and lighting conditions. To address these challenges, we propose IC-Portrait, a novel framework designed to accurately encode individual identities for personalized portrait generation. Our key insight is that pre-trained diffusion models are fast learners (e.g.,100 ~ 200 steps) for in-context dense correspondence matching, which motivates the two major designs of our IC-Portrait framework. Specifically, we reformulate portrait generation into two sub-tasks: 1) Lighting-Aware Stitching: we find that masking a high proportion of the input image, e.g., 80%, yields a highly effective self-supervisory representation learning of reference image lighting. 2) View-Consistent Adaptation: we leverage a synthetic view-consistent profile dataset to learn the in-context correspondence. The reference profile can then be warped into arbitrary poses for strong spatial-aligned view conditioning. Coupling these two designs by simply concatenating latents to form ControlNet-like supervision and modeling, enables us to significantly enhance the identity preservation fidelity and stability. Extensive evaluations demonstrate that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively, with particularly notable improvements in visual qualities. Furthermore, IC-Portrait even demonstrates 3D-aware relighting capabilities.