From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing robotic string-instrument performance, which relies on costly motion capture systems and lacks human-like sight-reading capability. We propose the first end-to-end framework that directly maps MIDI sheet music to bowing motions for a UR5e robotic arm, integrating Freedrive with real-time data exchange (RTDE) to enable obstacle-aware playing without motion capture. Residual reinforcement learning is introduced to enhance musical expressiveness. Our system achieves the first demonstration of human-like sight-reading of standard musical scores by a robot and introduces the “Musical Turing Test” as an evaluation benchmark. Effective performances across five pieces are validated through a blind listening test involving 132 participants, whose ratings indicate expressiveness approaching human levels. All real joint trajectories and reference audio recordings are publicly released to support future research in robotic music performance.

Technology Category

Application Category

📝 Abstract

Robot musicians require precise control to obtain proper note accuracy, sound quality, and musical expression. Performance of string instruments, such as violin and cello, presents a significant challenge due to the precise control required over bow angle and pressure to produce the desired sound. While prior robotic cellists focus on accurate bowing trajectories, these works often rely on expensive motion capture techniques, and fail to sightread music in a human-like way. We propose a novel end-to-end MIDI score to robotic motion pipeline which converts musical input directly into collision-aware bowing motions for a UR5e robot cellist. Through use of Universal Robot Freedrive feature, our robotic musician can achieve human-like sound without the need for motion capture. Additionally, this work records live joint data via Real-Time Data Exchange (RTDE) as the robot plays, providing labeled robotic playing data from a collection of five standard pieces to the research community. To demonstrate the effectiveness of our method in comparison to human performers, we introduce the Musical Turing Test, in which a collection of 132 human participants evaluate our robot's performance against a human baseline. Human reference recordings are also released, enabling direct comparison for future studies. This evaluation technique establishes the first benchmark for robotic cello performance. Finally, we outline a residual reinforcement learning methodology to improve upon baseline robotic controls, highlighting future opportunities for improved string-crossing efficiency and sound quality.

Problem

Research questions and friction points this paper is trying to address.

robotic cello performance

MIDI-to-motion

bowing control

musical expression

sight-reading

Innovation

Methods, ideas, or system contributions that make the work stand out.

end-to-end MIDI-to-motion

robotic cello performance

Musical Turing Test

residual reinforcement learning

collision-aware bowing

🔎 Similar Papers

No similar papers found.

Authors to Follow