Evaluating Logit-Based GOP Scores for Mispronunciation Detection

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional Goodness of Pronunciation (GOP) scores—derived from softmax posterior probabilities in automatic speech recognition (ASR)—exhibit insufficient sensitivity for second-language (L2) pronunciation error detection. Method: This study systematically evaluates and validates logit-based GOP computation directly from ASR model logits, proposing a max-logit GOP metric and a probability–logit hybrid GOP method that integrates uncertainty modeling and phoneme-weighted scoring. Contribution/Results: Max-logit GOP achieves significantly higher correlation with human perceptual judgments than probability-based GOP (p < 0.01). The hybrid approach enhances robustness while preserving interpretability. Experiments on English pronunciation data from Dutch and Mandarin native speakers demonstrate that logit-based GOP substantially outperforms baseline methods in binary error detection, with max-logit GOP attaining the highest correlation with expert ratings. This work establishes a more accurate and interpretable paradigm for L2 pronunciation assessment.

Technology Category

Application Category

📝 Abstract
Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating uncertainty modeling and phoneme-specific weighting improve pronunciation assessment.
Problem

Research questions and friction points this paper is trying to address.

Compares logit-based and probability-based GOP scores for mispronunciation detection
Evaluates performance on L2 English datasets from Dutch and Mandarin speakers
Proposes hybrid GOP methods to improve pronunciation assessment accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logit-based GOP scores outperform probability-based ones
Maximum logit GOP aligns best with human perception
Hybrid GOP methods combine probability and logit features
🔎 Similar Papers
No similar papers found.
A
Aditya Kamlesh Parikh
Centre for Language Studies, Radboud University, the Netherlands
C
Cristian Tejedor-Garcia
Centre for Language Studies, Radboud University, the Netherlands
Catia Cucchiarini
Catia Cucchiarini
Senior Researcher Radboud University
speech sciencephoneticsspeech technologyCALLL2 acquisition
Helmer Strik
Helmer Strik
Associate Professor Radboud University, co-founder and CSO NovoLanguage
Speech SciencePhoneticsLanguage & Speech TechnologyCALL