Snapping Matters: Context-Aware Onset Refinement for Automatic Music Transcription

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge posed by imprecise note onset annotations in weakly aligned score–audio data, which significantly limits the performance of automatic music transcription. It presents the first systematic analysis of the impact of onset “snapping” during cross-instrument transcription training and introduces a global context–aware optimization method. The approach formulates snapping as a pitch-wise assignment problem, constructing a bipartite graph from neural network posteriorgrams and dynamic time warping alignments, and replaces conventional greedy strategies with optimal matching within context-sensitive temporal windows. Evaluated on piano, chamber, and orchestral datasets, the method substantially improves both onset alignment accuracy and overall transcription performance, with particularly pronounced gains under wide snapping windows or coarse initial alignments.
📝 Abstract
Precise note-level annotations are critical for training automatic music transcription (AMT) systems, in particular note-onset labels, which form a core component of many recent AMT systems. However, high-quality annotations for real-world recordings are scarce. Sequence-level score--audio alignment methods such as dynamic time warping provide only coarse correspondence, making a local refinement step necessary. This refinement step, known as snapping, adjusts aligned score onsets using peaks in a neural onset posteriorgram and often determines whether weakly aligned score--audio pairs become usable training data at all. Despite its practical importance, snapping is typically treated as a simple post-processing heuristic and implemented with greedy local decisions. We present a systematic analysis of snapping strategies for training instrument-agnostic transcribers, demonstrating that snapping is essential for learning from weakly aligned data. Building on this, we formulate snapping as a per-pitch assignment problem and solve it via bipartite graph matching, yielding context-aware onset decisions under overlapping refinement windows and uncertain initial alignments. Extensive cross-dataset experiments across piano, chamber, and orchestral recordings show improved onset alignment and transcription accuracy over greedy snapping, with gains increasing for wider snapping windows and coarser initial alignments. Qualitative examples are provided on our project page: https://abhirupsaha8.github.io
Problem

Research questions and friction points this paper is trying to address.

automatic music transcription
note-onset refinement
score–audio alignment
snapping
weakly aligned data
Innovation

Methods, ideas, or system contributions that make the work stand out.

snapping
context-aware onset refinement
bipartite graph matching
automatic music transcription
weakly aligned data
🔎 Similar Papers
No similar papers found.