🤖 AI Summary
This paper addresses three key challenges in realistic human geometric animation generation: loss of fine geometric details, unnatural clothing dynamics, and difficulty in modeling from limited data. To this end, we propose a generative framework based on distributed latent spaces. Methodologically: (1) we construct an SMPL-guided compact implicit representation to achieve more uniform geometric mapping; (2) we design an identity-conditional two-stage diffusion model that jointly ensures short-term dynamic diversity and long-term motion consistency. Our contribution lies in the first integration of distributed latent space learning with SMPL geometric priors, significantly improving modeling efficiency and generalization under data scarcity. Experiments demonstrate a 90% reduction in Chamfer distance for implicit surface reconstruction and user study scores 2.2× higher than those of prior SOTA methods, with consistent superiority across all quantitative and qualitative metrics.
📝 Abstract
Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To address these challenges, we propose two novel designs. First, we propose a compact distribution-based latent representation that enables efficient and high-quality geometry generation. We improve upon previous work by establishing a more uniform mapping between SMPL and avatar geometries. Second, we introduce a generative animation model that fully exploits the diversity of limited motion data. We focus on short-term transitions while maintaining long-term consistency through an identity-conditioned design. These two designs formulate our method as a two-stage framework: the first stage learns a latent space, while the second learns to generate animations within this latent space. We conducted experiments on both our latent space and animation model. We demonstrate that our latent space produces high-fidelity human geometry surpassing previous methods ($90%$ lower Chamfer Dist.). The animation model synthesizes diverse animations with detailed and natural dynamics ($2.2 imes$ higher user study score), achieving the best results across all evaluation metrics.