FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Music-to-dance generation faces a fundamental trade-off between real-time performance and expressive fidelity required for high-quality 3D rendering. To address this, we propose an efficient, physically grounded non-autoregressive generation framework. Our method introduces MeanFlow to accelerate sampling and incorporates physics-based consistency constraints to ensure motion plausibility. We further design a channel-wise cross-modal fusion mechanism and adopt a BiMamba backbone to enhance music–motion alignment and spatiotemporal modeling capacity. Evaluated on AIST++ and FineDance, our approach achieves state-of-the-art motion quality (FID reduced by 12.3%, MM-Dist reduced by 8.7%), accelerates inference by 3.2×, and reduces GPU memory consumption by 41%. Moreover, it enables real-time interactive motion editing and high-fidelity 3D character animation—demonstrating strong practical applicability for immersive virtual environments.

Technology Category

Application Category

📝 Abstract
Music-to-dance generation aims to translate auditory signals into expressive human motion, with broad applications in virtual reality, choreography, and digital entertainment. Despite promising progress, the limited generation efficiency of existing methods leaves insufficient computational headroom for high-fidelity 3D rendering, thereby constraining the expressiveness of 3D characters during real-world applications. Thus, we propose FlowerDance, which not only generates refined motion with physical plausibility and artistic expressiveness, but also achieves significant generation efficiency on inference speed and memory utilization . Specifically, FlowerDance combines MeanFlow with Physical Consistency Constraints, which enables high-quality motion generation with only a few sampling steps. Moreover, FlowerDance leverages a simple but efficient model architecture with BiMamba-based backbone and Channel-Level Cross-Modal Fusion, which generates dance with efficient non-autoregressive manner. Meanwhile, FlowerDance supports motion editing, enabling users to interactively refine dance sequences. Extensive experiments on AIST++ and FineDance show that FlowerDance achieves state-of-the-art results in both motion quality and generation efficiency. Code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Generating expressive 3D dance motions from music signals efficiently
Overcoming computational limitations for high-fidelity 3D character rendering
Achieving physical plausibility and artistic expressiveness in dance generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

MeanFlow with Physical Consistency Constraints for motion
BiMamba backbone with cross-modal fusion efficiently
Non-autoregressive generation enables interactive motion editing
🔎 Similar Papers
No similar papers found.
K
Kaixing Yang
Renmin University of China
X
Xulong Tang
Malou Tech Inc
Ziqiao Peng
Ziqiao Peng
Renmin University of China
3D Face AnimationTalking Head Generation
X
Xiangyue Zhang
Wuhan University
P
Puwei Wang
Renmin University of China
J
Jun He
Tsinghua University
Hongyan Liu
Hongyan Liu
Zhejiang University
programable networksnetwork measurementP4 language