Geolocation-Aware Robust Spoken Language Identification

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised speech language identification (LID) models struggle to unify dialects and accents of the same language. This paper proposes the first geography-aware self-supervised LID framework: it introduces geographic location prediction—aligned with the language level—as an auxiliary task, and injects the predicted geographic vector as a conditional signal into intermediate representations to guide the model toward learning more robust, unified language representations. The method is formulated within a multi-task self-supervised learning paradigm and evaluated on six multilingual datasets. It achieves a state-of-the-art 97.7% accuracy on the FLEURS benchmark and yields a 9.7% relative improvement on the ML-SUPERB 2.0 dialect benchmark. The core contribution is the first integration of geographic priors into self-supervised LID, enhancing dialect- and accent-invariance through conditional representation learning.

Technology Category

Application Category

📝 Abstract
While Self-supervised Learning (SSL) has significantly improved Spoken Language Identification (LID), existing models often struggle to consistently classify dialects and accents of the same language as a unified class. To address this challenge, we propose geolocation-aware LID, a novel approach that incorporates language-level geolocation information into the SSL-based LID model. Specifically, we introduce geolocation prediction as an auxiliary task and inject the predicted vectors into intermediate representations as conditioning signals. This explicit conditioning encourages the model to learn more unified representations for dialectal and accented variations. Experiments across six multilingual datasets demonstrate that our approach improves robustness to intra-language variations and unseen domains, achieving new state-of-the-art accuracy on FLEURS (97.7%) and 9.7% relative improvement on ML-SUPERB 2.0 dialect set.
Problem

Research questions and friction points this paper is trying to address.

Classifying dialects and accents as unified language classes
Improving robustness to intra-language variations in LID
Enhancing model generalization across unseen domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geolocation-aware SSL for language identification
Auxiliary geolocation prediction as conditioning signals
Injecting geographic vectors into intermediate representations
🔎 Similar Papers
No similar papers found.