KuiSCIMA v2.0: Improved Baselines, Calibration, and Cross-Notation Generalization for Historical Chinese Music Notations in Jiang Kui's Baishidaoren Gequ

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Optical music recognition (OMR) of historical Chinese musical notation—particularly *suzipu* (cipher notation) and *lülüpu* (pitch-name notation)—faces severe challenges due to scarce training data and extreme class imbalance across 77 distinct *gongchepu* characters. Method: This paper introduces a cross-version generalization validation framework and an enhanced character recognition architecture, integrating temperature scaling for confidence calibration and leave-one-version-out cross-validation to mitigate version-specific bias. Contribution/Results: The approach significantly improves robustness in transcribing Jiang Kui’s *Bai Shi Dao Ren Ge Qu*: *gongchepu* character error rate drops from 10.4% to 7.1%, surpassing human transcription accuracy; *lülüpu* error rate reaches 0.9%; and model calibration error remains below 0.0162. All 109 pieces have been meticulously annotated and integrated into a high-quality, publicly accessible dataset. This work establishes a scalable, reproducible technical paradigm for digitizing and revitalizing historical Chinese musical manuscripts.

Technology Category

Application Category

📝 Abstract

Optical Music Recognition (OMR) for historical Chinese musical notations, such as suzipu and lülüpu, presents unique challenges due to high class imbalance and limited training data. This paper introduces significant advancements in OMR for Jiang Kui's influential collection Baishidaoren Gequ from 1202. In this work, we develop and evaluate a character recognition model for scarce imbalanced data. We improve upon previous baselines by reducing the Character Error Rate (CER) from 10.4% to 7.1% for suzipu, despite working with 77 highly imbalanced classes, and achieve a remarkable CER of 0.9% for lülüpu. Our models outperform human transcribers, with an average human CER of 15.9% and a best-case CER of 7.6%. We employ temperature scaling to achieve a well-calibrated model with an Expected Calibration Error (ECE) below 0.0162. Using a leave-one-edition-out cross-validation approach, we ensure robust performance across five historical editions. Additionally, we extend the KuiSCIMA dataset to include all 109 pieces from Baishidaoren Gequ, encompassing suzipu, lülüpu, and jianzipu notations. Our findings advance the digitization and accessibility of historical Chinese music, promoting cultural diversity in OMR and expanding its applicability to underrepresented music traditions.

Problem

Research questions and friction points this paper is trying to address.

Improves Optical Music Recognition for historical Chinese notations

Reduces Character Error Rate in imbalanced suzipu and lülüpu data

Enhances cross-notation generalization and model calibration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved character recognition for imbalanced data

Temperature scaling for well-calibrated model

Extended dataset with multiple notation types

🔎 Similar Papers

No similar papers found.

Authors to Follow