🤖 AI Summary
This work addresses the problem of full-body portrait pose transfer without paired training data, aiming to transform casual selfies into high-fidelity professional portraits featuring studio lighting, standardized attire (black clothing), ideal pose, and clean background—while preserving identity, facial geometry, and body structure. The proposed method introduces a novel cross-domain pose transfer framework based on a learnable canonical UV mapping, unifying occlusion handling and novel-view synthesis in UV space, and integrating implicit human representation with neural rendering. It further supports multi-image fine-tuning for personalized output. Experiments demonstrate that our approach significantly outperforms existing unpaired methods on real-world images, achieving state-of-the-art performance both quantitatively and qualitatively. Moreover, it enables robust pose transfer from arbitrary source poses to target poses.
📝 Abstract
Photographs of people taken by professional photographers typically present the person in beautiful lighting, with an interesting pose, and flattering quality. This is unlike common photos people can take of themselves. In this paper, we explore how to create a ``professional'' version of a person's photograph, i.e., in a chosen pose, in a simple environment, with good lighting, and standard black top/bottom clothing. A key challenge is to preserve the person's unique identity, face and body features while transforming the photo. If there would exist a large paired dataset of the same person photographed both ``in the wild'' and by a professional photographer, the problem would potentially be easier to solve. However, such data does not exist, especially for a large variety of identities. To that end, we propose two key insights: 1) Our method transforms the input photo and person's face to a canonical UV space, which is further coupled with reposing methodology to model occlusions and novel view synthesis. Operating in UV space allows us to leverage existing unpaired datasets. 2) We personalize the output photo via multi image finetuning. Our approach yields high-quality, reposed portraits and achieves strong qualitative and quantitative performance on real-world imagery.