ACMo: Attribute Controllable Motion Generation

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing motion generation methods struggle with precise disentangled control over multiple attributes—such as style, trajectory, and fine-grained textual descriptions—and exhibit poor generalization to unseen actions. To address this, we propose the first controllable motion generation framework enabling independent modulation of multiple attributes. It comprises three core components: (1) an attribute-disentangled diffusion model that orthogonally decomposes motion representations; (2) a lightweight Motion Adapter for low-resource fine-tuning and rapid adaptation; and (3) an LLM Planner that leverages local knowledge to achieve cross-domain semantic alignment and prompt-driven generation. Our method enables end-to-end, text-to-motion generation with explicit, real-time control over stylistic and structural attributes. It achieves state-of-the-art performance on text-to-motion benchmarks, supports zero-shot and few-shot generalization to unseen actions, and significantly improves controllability, generalization capability, and human–AI interaction efficiency.

Technology Category

Application Category

📝 Abstract

Attributes such as style, fine-grained text, and trajectory are specific conditions for describing motion. However, existing methods often lack precise user control over motion attributes and suffer from limited generalizability to unseen motions. This work introduces an Attribute Controllable Motion generation architecture, to address these challenges via decouple any conditions and control them separately. Firstly, we explored the Attribute Diffusion Model to imporve text-to-motion performance via decouple text and motion learning, as the controllable model relies heavily on the pre-trained model. Then, we introduce Motion Adpater to quickly finetune previously unseen motion patterns. Its motion prompts inputs achieve multimodal text-to-motion generation that captures user-specified styles. Finally, we propose a LLM Planner to bridge the gap between unseen attributes and dataset-specific texts via local knowledage for user-friendly interaction. Our approach introduces the capability for motion prompts for stylize generation, enabling fine-grained and user-friendly attribute control while providing performance comparable to state-of-the-art methods. Project page: https://mjwei3d.github.io/ACMo/

Problem

Research questions and friction points this paper is trying to address.

Enhances user control over motion attributes like style and trajectory.

Improves text-to-motion generation by decoupling text and motion learning.

Facilitates adaptation to unseen motion patterns via Motion Adapter.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute Diffusion Model for text-motion decoupling

Motion Adapter for unseen motion fine-tuning

LLM Planner for user-friendly attribute interaction

🔎 Similar Papers

Aligning Human Motion Generation with Human Perceptions