PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of high-quality benchmarks and methods for automatic hand-motion and fingering generation in piano pedagogy, this paper introduces the first large-scale benchmark for piano-playing hand-motion generation—comprising 116 hours of overhead-view videos and 10 million frames of precise hand-pose annotations. We propose an audio-driven two-stage generative framework: a position predictor extracts temporally constrained key-press sequences, while a position-guided gesture generator models multi-scale hand motion. Innovatively, we design a multidimensional evaluation suite—assessing motion similarity, smoothness, left/right-hand localization accuracy, and distribution fidelity—to bridge the gap in AI-assisted fingering instruction. Experiments demonstrate significant improvements over baselines in both fidelity and naturalness. The dataset, code, and models are fully open-sourced to accelerate AI-driven music education research and deployment.

Technology Category

Application Category

📝 Abstract
Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes. The source code and dataset can be accessed at https://github.com/agnJason/PianoMotion10M.
Problem

Research questions and friction points this paper is trying to address.

Hand motion generation in piano
Guidance for piano fingering
Dataset for piano performance videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

PianoMotion10M dataset creation
Baseline model for motion generation
Evaluation metrics for hand movements
🔎 Similar Papers
No similar papers found.