How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

📅 2024-11-27

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

To address the limited cross-lingual transferability and severe domain mismatch of self-supervised learning (SSL) pre-trained models in automatic speech recognition (ASR) for low-resource languages, this paper proposes a lightweight adapter method with *intermediate warm-start*. Under frozen SSL backbone constraints, only 1–5% of parameters are fine-tuned. A two-stage progressive adaptation jointly optimizes adapter architecture and downstream model initialization. The novel intermediate warm-start mechanism mitigates speech feature distribution shift, substantially improving generalization to unseen languages. Evaluated on the ML-SUPERB benchmark, our approach achieves up to 28% relative reduction in character/phone error rates over standard efficient fine-tuning, significantly alleviating the bottleneck in low-resource cross-lingual ASR adaptation.

Technology Category

Application Category

📝 Abstract

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive performance on Automatic Speech Recognition (ASR). However, in low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. Typical solutions like fine-tuning the SSL model suffer from high computation costs while using frozen SSL models as feature extractors comes with poor performance. To handle these issues, we extend a conventional efficient fine-tuning scheme based on the adapter. We add an extra intermediate adaptation to warm up the adapter and downstream model initialization. Remarkably, we update only 1-5% of the total model parameters to achieve the adaptation. Experimental results on the ML-SUPERB dataset show that our solution outperforms conventional efficient fine-tuning. It achieves up to a 28% relative improvement in the Character/Phoneme error rate when adapting to unseen languages.

Problem

Research questions and friction points this paper is trying to address.

Speech Self-Supervised Learning (SSL)

Low-Resource Languages

Automatic Speech Recognition (ASR)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter-based fine-tuning

Low-resource language adaptation

Self-supervised learning (SSL)

🔎 Similar Papers

No similar papers found.

Authors to Follow