Topologically Consistent Multi-view 3D Head Reconstruction via Coarse-Guided Layered Surface Sampling

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

232K/year
🤖 AI Summary
Existing multi-view 3D face reconstruction methods suffer from high memory consumption, limited scalability to dense topologies, and susceptibility to surface noise due to per-vertex independent optimization. This work proposes SHELLS, a novel framework that introduces coarse-mesh-guided hierarchical surface sampling for the first time in this task. By leveraging a DINOv2 backbone with LoRA adapters, SHELLS extracts a sparse global feature cloud and constructs surface-oriented sampling shells conditioned on a coarse mesh, effectively decoupling feature extraction from mesh resolution. Trained exclusively on synthetic data, the method generalizes robustly to real-world scenarios without requiring costly pre-aligned datasets. Compared to voxel-based baselines, SHELLS reduces inference GPU memory usage by 88% (2.4 GB vs. 20 GB), achieves a 3.5× speedup for 18k-vertex meshes (0.08 s vs. 0.29 s), and lowers median registration error by 21%–29%.
📝 Abstract
We present SHELLS (Semantic Head Estimation via Layered Local Sampling), an efficient feed-forward framework for 3D head reconstruction in dense semantic correspondence from multi-view images. Existing methods typically refine vertices independently via localized feature volumes. This approach couples memory-intensive feature sampling to mesh resolution, which limits scalability for dense topologies (> 10k vertices) and introduces surface noise. In contrast, SHELLS decouples feature extraction from mesh resolution via a hierarchical sampling strategy. We extract multi-view features using a DINOv2 backbone with LoRA adaptation, projectively sample a sparse global feature cloud, and predict an intermediate coarse mesh. This coarse prior guides the construction of layered, surface-aware sampling shells that serve as a discrete search space for the final reconstruction. SHELLS maintains surface consistency while using 88% less inference GPU memory (2.4GB vs. 20GB) than volumetric baselines. It reduces median registration error by 21% to 29% with a 3.5x inference speedup (0.08s vs. 0.29s) for 18k-vertex meshes. Notably, our model is trained exclusively on synthetic data yet generalizes effectively to real-world captures, eliminating the need for the costly, pre-registered multi-view datasets common in prior work.
Problem

Research questions and friction points this paper is trying to address.

3D head reconstruction
multi-view images
dense topology
surface consistency
semantic correspondence
Innovation

Methods, ideas, or system contributions that make the work stand out.

layered surface sampling
topologically consistent reconstruction
coarse-guided sampling
memory-efficient 3D reconstruction
synthetic-to-real generalization
🔎 Similar Papers
No similar papers found.