🤖 AI Summary
This paper addresses key challenges in single-image high-fidelity face swapping—namely, extreme pose discrepancies, heterogeneous hairstyle mismatches, structural distortions, and background fusion artifacts. We propose a novel two-stage framework: first, a multi-scale identity-preserving generative alignment network (Aligner) for robust cross-pose feature alignment; second, a semantic-aware Blender module that jointly enables adaptive skin-tone transfer and multi-scale inpainting of occlusion gaps. Our method is the first to systematically unify structural fidelity, color consistency, and seamless blending—three long-standing challenges in face swapping. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with particularly notable improvements in visual realism and identity preservation under large pose variations and significant hairstyle differences.
📝 Abstract
While the task of face swapping has recently gained attention in the research community, a related problem of head swapping remains largely unexplored. In addition to skin color transfer, head swap poses extra challenges, such as the need to preserve structural information of the whole head during synthesis and inpaint gaps between swapped head and background. In this paper, we address these concerns with GHOST 2.0, which consists of two problem-specific modules. First, we introduce enhanced Aligner model for head reenactment, which preserves identity information at multiple scales and is robust to extreme pose variations. Secondly, we use a Blender module that seamlessly integrates the reenacted head into the target background by transferring skin color and inpainting mismatched regions. Both modules outperform the baselines on the corresponding tasks, allowing to achieve state of the art results in head swapping. We also tackle complex cases, such as large difference in hair styles of source and target.