Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accent-robust automatic speech recognition (ASR) in low-resource settings. We propose a novel paradigm that enhances non-native accent recognition using *only native-language speech data*. Inspired by the human “inter-speaker intelligibility bias” (ISIB)—a psycholinguistic phenomenon where native speakers exhibit superior cross-accent intelligibility—we empirically demonstrate, for the first time in ASR, that discrete speech tokens derived from self-supervised learning (SSL) inherently possess ISIB properties, enabling strong generalization to non-native accents. By introducing a token-level language-variant modeling strategy—without any non-native accent data—we significantly improve robustness across diverse accents, especially for underrepresented ones. Our approach breaks the conventional dependency on accent-labeled data, offering an interpretable, scalable methodology for low-resource accent adaptation.

Technology Category

Application Category

📝 Abstract
In this study, we gained insight that contributes to achieving accent-robust ASR using only native speech data. In human perception of non-native speech, the phenomenon known as"interlanguage speech intelligibility benefit"(ISIB) is observed, where non-native listeners who share the native language with the speaker understand the speech better compared even to native listeners. Based on the idea that discrete tokens extracted from self-supervised learning (SSL) models represent the human perception of speech, we conducted an analytical study on the robustness of discrete token-based ASR to non-native speech, varying the language used for training the tokenization, which is viewed as a technical implementation of ISIB. The results showed that ISIB actually occurred in the discrete token-based ASR. Since our approach relies only on native speech data to simulate the behavior of human perception, it is expected to be applicable to a wide range of accents for which speech data is scarce.
Problem

Research questions and friction points this paper is trying to address.

Achieving accent-robust ASR using only native speech data
Investigating interlanguage speech intelligibility benefit in discrete tokens
Enhancing ASR robustness to non-native accents with limited data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using discrete tokens for accent-robust ASR
Leveraging self-supervised learning models
Simulating ISIB with native speech data
🔎 Similar Papers
No similar papers found.