Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing text-to-action generation methods rely on end-to-end mapping, resulting in shallow semantic understanding, weak logical reasoning, poor action controllability, insufficient long-horizon consistency, and limited motion diversity. To address these limitations, we propose the first unified motion-language modeling framework that integrates Chain-of-Thought (CoT) reasoning with reinforcement learning. Our approach explicitly decomposes natural language instructions into structured action paths and introduces Group Relative Policy Optimization (GRPO), a novel RL algorithm that jointly optimizes CoT-based reasoning chain generation and motion synthesis. Leveraging large language models, the method performs multi-step semantic disentanglement and action path planning. Evaluated on multiple benchmarks, it achieves state-of-the-art performance, significantly improving long-horizon coherence, instruction fidelity, and motion diversity. All code, models, and data are publicly released.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models, especially in natural language understanding and reasoning, have opened new possibilities for text-to-motion generation. Although existing approaches have made notable progress in semantic alignment and motion synthesis, they often rely on end-to-end mapping strategies that fail to capture deep linguistic structures and logical reasoning. Consequently, generated motions tend to lack controllability, consistency, and diversity. To address these limitations, we propose Motion-R1, a unified motion-language modeling framework that integrates a Chain-of-Thought mechanism. By explicitly decomposing complex textual instructions into logically structured action paths, Motion-R1 provides high-level semantic guidance for motion generation, significantly enhancing the model's ability to interpret and execute multi-step, long-horizon, and compositionally rich commands. To train our model, we adopt Group Relative Policy Optimization, a reinforcement learning algorithm designed for large models, which leverages motion quality feedback to optimize reasoning chains and motion synthesis jointly. Extensive experiments across multiple benchmark datasets demonstrate that Motion-R1 achieves competitive or superior performance compared to state-of-the-art methods, particularly in scenarios requiring nuanced semantic understanding and long-term temporal coherence. The code, model and data will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Improving controllability and consistency in text-to-motion generation

Enhancing motion diversity via structured linguistic reasoning

Addressing multi-step command execution in motion synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought reasoning for motion generation

Group Relative Policy Optimization algorithm

Decomposing text into structured action paths

🔎 Similar Papers

No similar papers found.