Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

📅 2025-02-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the limited robustness of automatic speech recognition (ASR) systems against non-native and accented speech, this paper proposes a data-driven mispronunciation modeling approach. First, it leverages attention maps to achieve unsupervised phoneme-level alignment between non-native and native utterances—without requiring prior linguistic knowledge. Based on this alignment, the method automatically discovers phoneme-level mispronunciation patterns and performs end-to-end adaptation of the ASR model. Evaluated on English native speech, the approach reduces word error rate (WER) by 5.7%; on Korean-accented English speech, WER decreases by 12.8%, demonstrating substantial cross-accent generalization. The core contribution lies in repurposing the attention mechanism as an interpretable tool for analyzing pronunciation deviations, thereby establishing a novel paradigm for low-resource accent adaptation.

Technology Category

Application Category

📝 Abstract

Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.

Problem

Research questions and friction points this paper is trying to address.

Accent Recognition

Speech Fluency

Non-native Speaker

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Maps

Machine Learning

Non-native Speaker Recognition

🔎 Similar Papers

No similar papers found.

Authors to Follow