🤖 AI Summary
Long-standing challenges in optical music recognition (OMR) include the absence of standardized evaluation benchmarks and fine-grained assessment metrics. To address this, we introduce SMB—the first large-scale, multi-textural, format-consistent sheet music benchmark comprising 685 pages—and propose OMR-NED, a dedicated metric for OMR evaluation based on an enhanced, Humdrum *kern*-encoded normalized edit distance. OMR-NED enables symbol-level precision quantification for musical elements such as noteheads, stems, and accidentals. We further provide standardized data splits and a comprehensive baseline model evaluation framework. This work establishes the first unified, reproducible evaluation standard for OMR, significantly improving cross-method comparability and assessment reliability. SMB and OMR-NED serve as a rigorous foundation for future research, offering both a publicly available benchmark and state-of-the-art performance references.
📝 Abstract
In this work, we introduce the Sheet Music Benchmark (SMB), a dataset of six hundred and eighty-five pages specifically designed to benchmark Optical Music Recognition (OMR) research. SMB encompasses a diverse array of musical textures, including monophony, pianoform, quartet, and others, all encoded in Common Western Modern Notation using the Humdrum **kern format. Alongside SMB, we introduce the OMR Normalized Edit Distance (OMR-NED), a new metric tailored explicitly for evaluating OMR performance. OMR-NED builds upon the widely-used Symbol Error Rate (SER), offering a fine-grained and detailed error analysis that covers individual musical elements such as note heads, beams, pitches, accidentals, and other critical notation features. The resulting numeric score provided by OMR-NED facilitates clear comparisons, enabling researchers and end-users alike to identify optimal OMR approaches. Our work thus addresses a long-standing gap in OMR evaluation, and we support our contributions with baseline experiments using standardized SMB dataset splits for training and assessing state-of-the-art methods.