High-Quality 3D Head Reconstruction from Any Single Portrait Image

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reconstructing high-fidelity 3D head models from a single portrait image remains challenging—especially under cross-view, multi-expression, and occluded conditions (e.g., accessories), where existing methods suffer from limited robustness and fidelity. To address this, we introduce the first high-quality, 96-view portrait dataset tailored for digital human modeling, and propose an identity- and expression-aware multi-view diffusion guidance mechanism that significantly improves cross-view facial consistency. Our framework integrates multi-view diffusion modeling, explicit geometric and texture supervision, orbit video generation, and neural radiance field (NeRF) reconstruction. Experiments demonstrate state-of-the-art performance in challenging scenarios—including extreme poses, partial occlusions, and accessory-induced distortions—enabling precise 3D head reconstruction driven by a single input image. The method supports fine-grained modeling via 96-frame orbit videos, establishing a novel paradigm for single-image-driven, high-fidelity digital human generation.

Technology Category

Application Category

📝 Abstract
In this work, we introduce a novel high-fidelity 3D head reconstruction method from a single portrait image, regardless of perspective, expression, or accessories. Despite significant efforts in adapting 2D generative models for novel view synthesis and 3D optimization, most methods struggle to produce high-quality 3D portraits. The lack of crucial information, such as identity, expression, hair, and accessories, limits these approaches in generating realistic 3D head models. To address these challenges, we construct a new high-quality dataset containing 227 sequences of digital human portraits captured from 96 different perspectives, totalling 21,792 frames, featuring diverse expressions and accessories. To further improve performance, we integrate identity and expression information into the multi-view diffusion process to enhance facial consistency across views. Specifically, we apply identity- and expression-aware guidance and supervision to extract accurate facial representations, which guide the model and enforce objective functions to ensure high identity and expression consistency during generation. Finally, we generate an orbital video around the portrait consisting of 96 multi-view frames, which can be used for 3D portrait model reconstruction. Our method demonstrates robust performance across challenging scenarios, including side-face angles and complex accessories
Problem

Research questions and friction points this paper is trying to address.

High-fidelity 3D head reconstruction from single portrait images.
Overcoming limitations in identity, expression, and accessory representation.
Generating realistic 3D models with consistent facial features across views.
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-fidelity 3D head reconstruction from single image
Multi-view diffusion with identity and expression guidance
Orbital video generation for 3D model reconstruction
🔎 Similar Papers
No similar papers found.