Animus3D: Text-driven 3D Animation via Motion Score Distillation

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing text-driven 3D animation methods based on Score Distillation Sampling (SDS) suffer from insufficient motion magnitude and temporal jitter. To address this, we propose Motion Score Distillation (MSD), a novel paradigm that replaces SDS to explicitly model spatiotemporal motion priors. Our approach efficiently fine-tunes a pre-trained video diffusion model via LoRA, augmented by inverse noise estimation and spatiotemporal regularization to jointly ensure motion controllability and geometric/appearance consistency. We further introduce a motion refinement module to enhance dynamic detail fidelity. Given only a single static 3D model and a natural language prompt, our method generates high-fidelity, temporally coherent, and fluid 3D motion sequences. It significantly improves motion amplitude and detail richness while effectively suppressing jitter and geometric distortion. Extensive evaluations demonstrate visually complete and physically plausible animations across diverse textual prompts.

Technology Category

Application Category

📝 Abstract

We present Animus3D, a text-driven 3D animation framework that generates motion field given a static 3D asset and text prompt. Previous methods mostly leverage the vanilla Score Distillation Sampling (SDS) objective to distill motion from pretrained text-to-video diffusion, leading to animations with minimal movement or noticeable jitter. To address this, our approach introduces a novel SDS alternative, Motion Score Distillation (MSD). Specifically, we introduce a LoRA-enhanced video diffusion model that defines a static source distribution rather than pure noise as in SDS, while another inversion-based noise estimation technique ensures appearance preservation when guiding motion. To further improve motion fidelity, we incorporate explicit temporal and spatial regularization terms that mitigate geometric distortions across time and space. Additionally, we propose a motion refinement module to upscale the temporal resolution and enhance fine-grained details, overcoming the fixed-resolution constraints of the underlying video model. Extensive experiments demonstrate that Animus3D successfully animates static 3D assets from diverse text prompts, generating significantly more substantial and detailed motion than state-of-the-art baselines while maintaining high visual integrity. Code will be released at https://qiisun.github.io/animus3d_page.

Problem

Research questions and friction points this paper is trying to address.

Generates 3D animation from text and static 3D assets

Addresses minimal movement and jitter in existing methods

Enhances motion fidelity and preserves visual integrity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Motion Score Distillation for text-driven 3D animation

Uses LoRA-enhanced video diffusion with static source distribution

Incorporates temporal-spatial regularization and motion refinement module

🔎 Similar Papers

Pushing the Boundaries of Text to Motion with Arbitrary Text: A New Task