How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited cross-lingual transferability and severe domain mismatch of self-supervised learning (SSL) pre-trained models in automatic speech recognition (ASR) for low-resource languages, this paper proposes a lightweight adapter method with *intermediate warm-start*. Under frozen SSL backbone constraints, only 1–5% of parameters are fine-tuned. A two-stage progressive adaptation jointly optimizes adapter architecture and downstream model initialization. The novel intermediate warm-start mechanism mitigates speech feature distribution shift, substantially improving generalization to unseen languages. Evaluated on the ML-SUPERB benchmark, our approach achieves up to 28% relative reduction in character/phone error rates over standard efficient fine-tuning, significantly alleviating the bottleneck in low-resource cross-lingual ASR adaptation.

Technology Category

Application Category

📝 Abstract
The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. Typical solutions like fine-tuning the SSL model suffer from high computation costs while using frozen SSL models as feature extractors comes with poor performance. To handle these issues, we extend a conventional efficient fine-tuning scheme based on the adapter. We add an extra intermediate adaptation to warm up the adapter and downstream model initialization. Remarkably, we update only 1-5% of the total model parameters to achieve the adaptation. Experimental results on the ML-SUPERB dataset show that our solution outperforms conventional efficient fine-tuning. It achieves up to a 28% relative improvement in the Character/Phoneme error rate when adapting to unseen languages.
Problem

Research questions and friction points this paper is trying to address.

Speech Self-Supervised Learning (SSL)
Low-Resource Languages
Automatic Speech Recognition (ASR)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter-based fine-tuning
Low-resource language adaptation
Self-supervised learning (SSL)
🔎 Similar Papers
No similar papers found.
S
Shih-Heng Wang
National Taiwan University, Taiwan
Z
Zih-Ching Chen
National Taiwan University, Taiwan
J
Jiatong Shi
Carnegie Mellon University, US
M
Ming-To Chuang
National Taiwan University, Taiwan
Guan-Ting Lin
Guan-Ting Lin
National Taiwan University
Speech ProcessingNature Language ProcessingMachine Learning
K
Kuan-Po Huang
National Taiwan University, Taiwan
David Harwath
David Harwath
The University of Texas at Austin
Speech and Language ProcessingComputer VisionNatural Language ProcessingArtificial IntelligenceMachine Learning
S
Shang-Wen Li
FAIR, US
Hung-yi Lee
Hung-yi Lee
National Taiwan University
deep learningspoken language understandingspeech processing