Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

📅 2025-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness of automatic speech recognition (ASR) systems against non-native and accented speech, this paper proposes a data-driven mispronunciation modeling approach. First, it leverages attention maps to achieve unsupervised phoneme-level alignment between non-native and native utterances—without requiring prior linguistic knowledge. Based on this alignment, the method automatically discovers phoneme-level mispronunciation patterns and performs end-to-end adaptation of the ASR model. Evaluated on English native speech, the approach reduces word error rate (WER) by 5.7%; on Korean-accented English speech, WER decreases by 12.8%, demonstrating substantial cross-accent generalization. The core contribution lies in repurposing the attention mechanism as an interpretable tool for analyzing pronunciation deviations, thereby establishing a novel paradigm for low-resource accent adaptation.

Technology Category

Application Category

📝 Abstract
Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.
Problem

Research questions and friction points this paper is trying to address.

Accent Recognition
Speech Fluency
Non-native Speaker
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Maps
Machine Learning
Non-native Speaker Recognition
🔎 Similar Papers
No similar papers found.
Anna Seo Gyeong Choi
Anna Seo Gyeong Choi
Cornell University
ai ethicsalgorithmic fairnessautomatic speech recognitionvoice biomarkerclinical speech
J
Jonghyeon Park
NAVER Cloud Corporation, Republic of Korea
M
Myungwoo Oh
NAVER Cloud Corporation, Republic of Korea