๐ค AI Summary
To address poor generalization and cross-view distortion in monocular in-the-wild video reconstruction of animatable 3D human avatars, this paper proposes a novel Universal Prior Modeling (UPM) framework. UPM distills an identity-agnostic pose-geometry prior from large-scale multi-view clothed human data and integrates forward/backward Gaussian mapping with inverse rendering optimization for robust, personalized modeling from monocular input. Leveraging 3D Gaussians as an explicit representation, UPM significantly improves novel-pose synthesis and free-viewpoint rendering qualityโwithout requiring multi-view supervision. Extensive evaluations on public benchmarks demonstrate that UPM consistently outperforms shape priors (e.g., SMPL) and heuristic regularization methods, establishing new state-of-the-art performance in monocular human reconstruction. The method enables high-fidelity animation and photorealistic novel-view synthesis.
๐ Abstract
We present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos. Building a high-quality avatar that supports animation with diverse poses from a monocular video is challenging because the observation of pose diversity and view points is inherently limited. The lack of pose variations typically leads to poor generalization to novel poses, and avatars can easily overfit to limited input view points, producing artifacts and distortions from other views. In this work, we address these limitations by leveraging a universal prior model (UPM) learned from a large corpus of multi-view clothed human performance capture data. We build our representation on top of expressive 3D Gaussians with canonical front and back maps shared across identities. Once the UPM is learned to accurately reproduce the large-scale multi-view human images, we fine-tune the model with an in-the-wild video via inverse rendering to obtain a personalized photorealistic human avatar that can be faithfully animated to novel human motions and rendered from novel views. The experiments show that our approach based on the learned universal prior sets a new state-of-the-art in monocular avatar reconstruction by substantially outperforming existing approaches relying only on heuristic regularization or a shape prior of minimally clothed bodies (e.g., SMPL) on publicly available datasets.