Evaluating Logit-Based GOP Scores for Mispronunciation Detection

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Conventional Goodness of Pronunciation (GOP) scores—derived from softmax posterior probabilities in automatic speech recognition (ASR)—exhibit insufficient sensitivity for second-language (L2) pronunciation error detection. Method: This study systematically evaluates and validates logit-based GOP computation directly from ASR model logits, proposing a max-logit GOP metric and a probability–logit hybrid GOP method that integrates uncertainty modeling and phoneme-weighted scoring. Contribution/Results: Max-logit GOP achieves significantly higher correlation with human perceptual judgments than probability-based GOP (p < 0.01). The hybrid approach enhances robustness while preserving interpretability. Experiments on English pronunciation data from Dutch and Mandarin native speakers demonstrate that logit-based GOP substantially outperforms baseline methods in binary error detection, with max-logit GOP attaining the highest correlation with expert ratings. This work establishes a more accurate and interpretable paradigm for L2 pronunciation assessment.

Technology Category

Application Category

📝 Abstract

Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating uncertainty modeling and phoneme-specific weighting improve pronunciation assessment.

Problem

Research questions and friction points this paper is trying to address.

Compares logit-based and probability-based GOP scores for mispronunciation detection

Evaluates performance on L2 English datasets from Dutch and Mandarin speakers

Proposes hybrid GOP methods to improve pronunciation assessment accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Logit-based GOP scores outperform probability-based ones

Maximum logit GOP aligns best with human perception

Hybrid GOP methods combine probability and logit features

🔎 Similar Papers

No similar papers found.

Authors to Follow