Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior

๐Ÿ“… 2025-03-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address poor generalization and cross-view distortion in monocular in-the-wild video reconstruction of animatable 3D human avatars, this paper proposes a novel Universal Prior Modeling (UPM) framework. UPM distills an identity-agnostic pose-geometry prior from large-scale multi-view clothed human data and integrates forward/backward Gaussian mapping with inverse rendering optimization for robust, personalized modeling from monocular input. Leveraging 3D Gaussians as an explicit representation, UPM significantly improves novel-pose synthesis and free-viewpoint rendering qualityโ€”without requiring multi-view supervision. Extensive evaluations on public benchmarks demonstrate that UPM consistently outperforms shape priors (e.g., SMPL) and heuristic regularization methods, establishing new state-of-the-art performance in monocular human reconstruction. The method enables high-fidelity animation and photorealistic novel-view synthesis.

Technology Category

Application Category

๐Ÿ“ Abstract
We present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos. Building a high-quality avatar that supports animation with diverse poses from a monocular video is challenging because the observation of pose diversity and view points is inherently limited. The lack of pose variations typically leads to poor generalization to novel poses, and avatars can easily overfit to limited input view points, producing artifacts and distortions from other views. In this work, we address these limitations by leveraging a universal prior model (UPM) learned from a large corpus of multi-view clothed human performance capture data. We build our representation on top of expressive 3D Gaussians with canonical front and back maps shared across identities. Once the UPM is learned to accurately reproduce the large-scale multi-view human images, we fine-tune the model with an in-the-wild video via inverse rendering to obtain a personalized photorealistic human avatar that can be faithfully animated to novel human motions and rendered from novel views. The experiments show that our approach based on the learned universal prior sets a new state-of-the-art in monocular avatar reconstruction by substantially outperforming existing approaches relying only on heuristic regularization or a shape prior of minimally clothed bodies (e.g., SMPL) on publicly available datasets.
Problem

Research questions and friction points this paper is trying to address.

Create photorealistic 3D avatars from monocular videos
Overcome limited pose diversity and view points
Improve generalization to novel poses and views
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses universal prior model for avatar creation
Leverages 3D Gaussians with canonical maps
Fine-tunes model via inverse rendering
๐Ÿ”Ž Similar Papers
No similar papers found.