FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Music-to-dance generation faces a fundamental trade-off between real-time performance and expressive fidelity required for high-quality 3D rendering. To address this, we propose an efficient, physically grounded non-autoregressive generation framework. Our method introduces MeanFlow to accelerate sampling and incorporates physics-based consistency constraints to ensure motion plausibility. We further design a channel-wise cross-modal fusion mechanism and adopt a BiMamba backbone to enhance music–motion alignment and spatiotemporal modeling capacity. Evaluated on AIST++ and FineDance, our approach achieves state-of-the-art motion quality (FID reduced by 12.3%, MM-Dist reduced by 8.7%), accelerates inference by 3.2×, and reduces GPU memory consumption by 41%. Moreover, it enables real-time interactive motion editing and high-fidelity 3D character animation—demonstrating strong practical applicability for immersive virtual environments.

Technology Category

Application Category

📝 Abstract

Music-to-dance generation aims to translate auditory signals into expressive human motion, with broad applications in virtual reality, choreography, and digital entertainment. Despite promising progress, the limited generation efficiency of existing methods leaves insufficient computational headroom for high-fidelity 3D rendering, thereby constraining the expressiveness of 3D characters during real-world applications. Thus, we propose FlowerDance, which not only generates refined motion with physical plausibility and artistic expressiveness, but also achieves significant generation efficiency on inference speed and memory utilization . Specifically, FlowerDance combines MeanFlow with Physical Consistency Constraints, which enables high-quality motion generation with only a few sampling steps. Moreover, FlowerDance leverages a simple but efficient model architecture with BiMamba-based backbone and Channel-Level Cross-Modal Fusion, which generates dance with efficient non-autoregressive manner. Meanwhile, FlowerDance supports motion editing, enabling users to interactively refine dance sequences. Extensive experiments on AIST++ and FineDance show that FlowerDance achieves state-of-the-art results in both motion quality and generation efficiency. Code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Generating expressive 3D dance motions from music signals efficiently

Overcoming computational limitations for high-fidelity 3D character rendering

Achieving physical plausibility and artistic expressiveness in dance generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

MeanFlow with Physical Consistency Constraints for motion

BiMamba backbone with cross-modal fusion efficiently

Non-autoregressive generation enables interactive motion editing

🔎 Similar Papers

Flexible Music-Conditioned Dance Generation with Style Description Prompts