🤖 AI Summary
This study addresses the poor performance of speech foundation models on extremely low-resource Indigenous Pacific languages and the catastrophic forgetting induced by full fine-tuning. The authors propose a multilingual sequential learning framework to systematically evaluate full fine-tuning against Low-Rank Adaptation (LoRA) for continual speech adaptation. Their analysis reveals critical challenges in continual learning for distantly related, low-resource languages, including representational drift and the stability-plasticity trade-off. While LoRA demonstrates initial effectiveness, it still suffers from significant forgetting over multilingual sequential training. This work underscores the urgent need—and substantial difficulty—of developing robust, sustainable adaptation strategies for marginalized languages in continual learning settings.
📝 Abstract
Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate how data volume and linguistic features affect adaptation success. Specifically, we evaluate strategies including Full Fine-Tuning and Low-Rank Adaptation (LoRA). Additionally, we analyze a continual learning framework for sequentially acquiring multiple languages. We demonstrate that adapting to these distant languages causes severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.