🤖 AI Summary
Existing text-driven 3D animation methods based on Score Distillation Sampling (SDS) suffer from insufficient motion magnitude and temporal jitter. To address this, we propose Motion Score Distillation (MSD), a novel paradigm that replaces SDS to explicitly model spatiotemporal motion priors. Our approach efficiently fine-tunes a pre-trained video diffusion model via LoRA, augmented by inverse noise estimation and spatiotemporal regularization to jointly ensure motion controllability and geometric/appearance consistency. We further introduce a motion refinement module to enhance dynamic detail fidelity. Given only a single static 3D model and a natural language prompt, our method generates high-fidelity, temporally coherent, and fluid 3D motion sequences. It significantly improves motion amplitude and detail richness while effectively suppressing jitter and geometric distortion. Extensive evaluations demonstrate visually complete and physically plausible animations across diverse textual prompts.
📝 Abstract
We present Animus3D, a text-driven 3D animation framework that generates motion field given a static 3D asset and text prompt. Previous methods mostly leverage the vanilla Score Distillation Sampling (SDS) objective to distill motion from pretrained text-to-video diffusion, leading to animations with minimal movement or noticeable jitter. To address this, our approach introduces a novel SDS alternative, Motion Score Distillation (MSD). Specifically, we introduce a LoRA-enhanced video diffusion model that defines a static source distribution rather than pure noise as in SDS, while another inversion-based noise estimation technique ensures appearance preservation when guiding motion. To further improve motion fidelity, we incorporate explicit temporal and spatial regularization terms that mitigate geometric distortions across time and space. Additionally, we propose a motion refinement module to upscale the temporal resolution and enhance fine-grained details, overcoming the fixed-resolution constraints of the underlying video model. Extensive experiments demonstrate that Animus3D successfully animates static 3D assets from diverse text prompts, generating significantly more substantial and detailed motion than state-of-the-art baselines while maintaining high visual integrity. Code will be released at https://qiisun.github.io/animus3d_page.