Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the tone recognition capability of self-supervised speech models (SSL) on three low-resource tonal languages from Northeast India—Angami, Ao, and Mizo—where limited annotated data and linguistic diversity hinder robust modeling. Method: Leveraging the Wav2Vec 2.0 architecture, we perform layer-wise feature extraction and linear probing to systematically assess the discriminative power of hidden representations for tone classification. We further conduct cross-lingual pretraining experiments comparing tonal versus non-tonal languages. Contribution/Results: We first demonstrate that intermediate SSL layers yield the most discriminative tone representations—a pattern consistent across both tonal and non-tonal pretraining languages. Key linguistic factors modulating performance include tone type, phonological inventory size, and dialectal variation. Empirically, Mizo achieves the highest tone recognition accuracy, while Angami performs worst. Our work establishes an interpretable representation analysis framework for SSL modeling of low-resource tonal languages and identifies concrete architectural and pretraining strategies for improvement.

Technology Category

Application Category

📝 Abstract
This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in low-resource settings. The source code is available at GitHub 1 .
Problem

Research questions and friction points this paper is trying to address.

Exploring SSL models for tone recognition in low-resource NE Indian languages
Analyzing layer-wise performance of Wav2vec2.0 models for tonal languages
Investigating impact of tone inventory and dialect variations on recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

SSL models for low-resource tonal languages
Middle layers crucial for tone recognition
Tone inventory affects recognition accuracy
🔎 Similar Papers
No similar papers found.