🤖 AI Summary
Music-to-dance generation faces a fundamental trade-off between real-time performance and expressive fidelity required for high-quality 3D rendering. To address this, we propose an efficient, physically grounded non-autoregressive generation framework. Our method introduces MeanFlow to accelerate sampling and incorporates physics-based consistency constraints to ensure motion plausibility. We further design a channel-wise cross-modal fusion mechanism and adopt a BiMamba backbone to enhance music–motion alignment and spatiotemporal modeling capacity. Evaluated on AIST++ and FineDance, our approach achieves state-of-the-art motion quality (FID reduced by 12.3%, MM-Dist reduced by 8.7%), accelerates inference by 3.2×, and reduces GPU memory consumption by 41%. Moreover, it enables real-time interactive motion editing and high-fidelity 3D character animation—demonstrating strong practical applicability for immersive virtual environments.
📝 Abstract
Music-to-dance generation aims to translate auditory signals into expressive human motion, with broad applications in virtual reality, choreography, and digital entertainment. Despite promising progress, the limited generation efficiency of existing methods leaves insufficient computational headroom for high-fidelity 3D rendering, thereby constraining the expressiveness of 3D characters during real-world applications. Thus, we propose FlowerDance, which not only generates refined motion with physical plausibility and artistic expressiveness, but also achieves significant generation efficiency on inference speed and memory utilization . Specifically, FlowerDance combines MeanFlow with Physical Consistency Constraints, which enables high-quality motion generation with only a few sampling steps. Moreover, FlowerDance leverages a simple but efficient model architecture with BiMamba-based backbone and Channel-Level Cross-Modal Fusion, which generates dance with efficient non-autoregressive manner. Meanwhile, FlowerDance supports motion editing, enabling users to interactively refine dance sequences. Extensive experiments on AIST++ and FineDance show that FlowerDance achieves state-of-the-art results in both motion quality and generation efficiency. Code will be released upon acceptance.