MotionDreamer: Universal Skeletal Motion Generation for 3D Rigged Shapes

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing skeletal animation methods are constrained by fixed topology, limiting their generalization to diverse morphologies, and suffer from computationally expensive per-instance optimization that is prone to local optima and viewpoint ambiguities. This work proposes MotionDreamer, a category-agnostic diffusion framework that generates 3D character animations guided by 2D videos. Its key innovation lies in a structure–semantic injection mechanism that embeds texture and semantic attributes into joint representations, enabling precise mapping from 2D motion dynamics to heterogeneous 3D skeletons, augmented with hierarchical joint functional modeling. To support this approach, the authors curate a dataset of approximately 20,000 high-quality 3D models with articulated skeletons, textures, and animations. MotionDreamer produces high-fidelity, anatomically consistent animations on both unseen real and fantastical creatures, significantly outperforming prior methods and establishing a new benchmark for 4D asset generation.

📝 Abstract

Motion generation for rigged shapes is vital for scalable 4D asset production. However, template-based methods are limited by specific topologies and fail to generalize across diverse morphologies. Conversely, per-case optimization is computationally expensive, susceptible to local optima, and highly sensitive to viewpoint-induced ambiguities. In this paper, we present MotionDreamer, a diffusion-based framework designed for category-agnostic skeletal animation generation from 2D video guidance. To overcome the scarcity of high-quality training data, we have curated a large-scale dynamic dataset comprising approximately 20,000 diverse 3D models, each featuring complete textures, skeletal rigging, and a wide array of comprehensive animation sequences. To bridge the kinematic gap between 2D visual motion cues and heterogeneous 3D skeletal structures, we propose a structural-semantic injection mechanism. Our model integrates texture and semantic attributes directly into skeletal joint representations. This allows it to map perceived visual dynamics to specific joint hierarchies and their functional roles. This enables MotionDreamer to synthesize high-fidelity animations that maintain anatomical consistency across a vast range of unseen categories, from existing biological species to fantastical beings. Extensive experiments demonstrate that our approach significantly outperforms existing methods, setting a new state-of-the-art benchmark for robust and efficient 4D asset generation. The code will be made publicly available upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

skeletal motion generation

3D rigged shapes

motion animation

4D asset production

morphological generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based animation

category-agnostic motion generation

structural-semantic injection