BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automatic pitch correction (APC) methods either rely on reference pitch contours—limiting practical applicability—or employ simplistic pitch estimation, failing to balance accuracy and vocal expressiveness. This paper proposes the first reference-free, end-to-end pitch correction framework. Our method integrates: (1) a perception-driven static pitch predictor; (2) context-aware note sequence modeling via a music language model; and (3) a note-level correction algorithm that preserves emotional expression. Additionally, we design a learnable data augmentation strategy that accurately simulates realistic intonation errors. Experiments demonstrate that our approach achieves a 10.49% absolute improvement in pitch accuracy over ROSVOT on severely out-of-tune samples, with a Mean Opinion Score (MOS) of 4.32—significantly surpassing commercial baselines Auto-Tune and Melodyne. Crucially, our method better retains timbral characteristics, vibrato, and other expressive nuances.

Technology Category

Application Category

📝 Abstract
Automatic Pitch Correction (APC) enhances vocal recordings by aligning pitch deviations with the intended musical notes. However, existing APC systems either rely on reference pitches, which limits their practical applicability, or employ simple pitch estimation algorithms that often fail to preserve expressiveness and naturalness. We propose BERT-APC, a novel reference-free APC framework that corrects pitch errors while maintaining the natural expressiveness of vocal performances. In BERT-APC, a novel stationary pitch predictor first estimates the perceived pitch of each note from the detuned singing voice. A context-aware note pitch predictor estimates the intended pitch sequence by leveraging a music language model repurposed to incorporate musical context. Finally, a note-level correction algorithm fixes pitch errors while preserving intentional pitch deviations for emotional expression. In addition, we introduce a learnable data augmentation strategy that improves the robustness of the music language model by simulating realistic detuning patterns. Compared to two recent singing voice transcription models, BERT-APC demonstrated superior performance in note pitch prediction, outperforming the second-best model, ROSVOT, by 10.49%p on highly detuned samples in terms of the raw pitch accuracy. In the MOS test, BERT-APC achieved the highest score of $4.32 pm 0.15$, which is significantly higher than those of the widely-used commercial APC tools, AutoTune ($3.22 pm 0.18$) and Melodyne ($3.08 pm 0.18$), while maintaining a comparable ability to preserve expressive nuances. To the best of our knowledge, this is the first APC model that leverages a music language model to achieve reference-free pitch correction with symbolic musical context. The corrected audio samples of BERT-APC are available online.
Problem

Research questions and friction points this paper is trying to address.

Correcting pitch errors without relying on reference pitches
Preserving natural expressiveness in vocal performances
Inferring intended pitch sequences using musical context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses stationary pitch predictor for perceived pitch estimation
Employs music language model for context-aware pitch prediction
Applies note-level correction preserving expressive pitch deviations
🔎 Similar Papers
No similar papers found.
S
Sungjae Kim
Computer Science and Electronical Engineering Department, Handong Global University, Pohang 37554, Korea
K
Kihyun Na
Computer Science and Electronical Engineering Department, Handong Global University, Pohang 37554, Korea
J
Jinyoung Choi
Computer Science and Electronical Engineering Department, Handong Global University, Pohang 37554, Korea
Injung Kim
Injung Kim
Professor, Handong Global University
AIdeep learningimage analysis and synthesisspeech synthesissmart factory