Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

📅 2024-06-20
🏛️ Interspeech
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
To address the dual challenges of “new-language enhancement” and “original-language capability preservation” in cross-lingual adaptation of self-supervised speech models, this paper proposes a LoRA-driven framework for language-incremental expansion. Methodologically, it integrates Low-Rank Adaptation (LoRA) with a dual-track capability retention strategy: (i) multilingual data mixing during fine-tuning and (ii) k-means re-clustering to optimize the discrete representation space. The approach is instantiated on the mHuBERT architecture to enable efficient Chinese extension. Experiments demonstrate that only 0.3% of mHuBERT’s parameters require tuning for Chinese integration, yielding a MOS improvement of 1.6 and a relative WER reduction of 61.72%, while preserving zero performance degradation across all pre-existing language tasks. To our knowledge, this is the first work to systematically introduce LoRA into progressive multilingual expansion of self-supervised speech models, achieving a favorable trade-off among parameter efficiency, cross-lingual compatibility, and capability stability.

Technology Category

Application Category

📝 Abstract
Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact.
Problem

Research questions and friction points this paper is trying to address.

Adapt SSL models to new languages efficiently
Maintain original performance on existing languages
Reduce development costs for multilingual expansion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating LoRA for multilingual SSL adaptation
Data combination and re-clustering preservation strategies
Applied to mHuBERT with speech resynthesis validation
🔎 Similar Papers
No similar papers found.
J
Jing Xu
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China
M
Minglin Wu
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China
Xixin Wu
Xixin Wu
The Chinese University of Hong Kong
H
Helen M. Meng
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China; Centre for Perceptual and Interactive Intelligence (CPII) Limited, HKSAR, China