🤖 AI Summary
This work addresses the limited performance of existing optical music recognition (OMR) systems on real-world handwritten piano scores, which stems primarily from the scarcity of diverse, realistically annotated training data—most datasets rely on digitally generated notation that fails to capture the visual variability of handwritten manuscripts, while manual annotation remains prohibitively expensive. To tackle this challenge under resource-constrained conditions, the authors propose a domain-adaptive approach that leverages Music Notation Graphs (MuNGs) and the Smashcima synthesis framework to generate photorealistic handwritten scores using out-of-domain symbols, thereby substantially reducing reliance on finely annotated real data. The study establishes the first end-to-end OMR baseline for complex handwritten piano manuscripts and demonstrates through experiments that the proposed method significantly improves recognition accuracy on authentic historical music documents, advancing the practical applicability of OMR in music heritage preservation.
📝 Abstract
Optical Music Recognition (OMR) has seen major progress in model design, with end-to-end methods now capable of recognising notation at all levels of complexity. However, the impact of this progress has been limited by the visual domains of available training datasets, which are largely born-digital. Existing large collections of sheet music in libraries and other heritage institutions contain predominantly manuscripts, whose visual domains are highly diverse and different, so existing OMR systems fail when applied in the real world. These institutions are often resource-constrained, so large in-domain datasets cannot be expected. We provide a first baseline on real-world manuscripts with complex piano notation in the resource-constrained scenario. Using fine-grained music notation graph (MuNG) annotations and the Smashcima synthesis tool, we then show that while some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided. We thus bring OMR closer to one of its stated goals: preserving and promoting musical cultural heritage.