Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenges of low-resource multilingual speech recognition, which are primarily constrained by data scarcity and inefficient adaptation. The authors propose DAMA, a depth-aware adaptation framework that reveals and leverages a U-shaped adaptability pattern within multilingual ASR models—allocating adaptation capacity to shallow and deep layers while freezing intermediate layers to preserve shared semantic representations. By integrating SVD-based initialization with parameter-efficient fine-tuning, DAMA achieves state-of-the-art or comparable performance across 18 low-resource languages using only 20% trainable parameters. Notably, it reduces word error rates by up to 29% under extreme data scarcity, substantially improving memory efficiency, training time, and computational cost.

Technology Category

Application Category

📝 Abstract

Recent speech foundation models excel at multilingual automatic speech recognition (ASR) for high-resource languages, but adapting them to low-resource languages remains challenging due to data scarcity and efficiency constraints. Full-model fine-tuning is computationally expensive and prone to overfitting, while parameter-efficient methods like LoRA apply adaptation uniformly across layers, overlooking internal representations thus compromising effectiveness and efficiency. We analyze multilingual ASR models and reveal a U-shaped adaptability pattern: early and late layers are language-specific and require more adaptation, while intermediate layers retain shared semantics and need less. Building on this observation, we propose DAMA, a Depth-Aware Model Adaptation framework that allocates adaptation capacity according to each layer's role. DAMA also introduces Singular Value Decomposition (SVD)-based initialization to constrain adaptation and preserve the U-shaped pattern, as well as a frozen middle-layer basis for further efficiency. Evaluated on 18 low-resource languages across two benchmark datasets, DAMA matches or surpasses state-of-the-art accuracy with 80% fewer trainable parameters, achieves a 29% error reduction under extreme data scarcity, and significantly improves memory, training time, and computational efficiency over baselines. These results highlight the benefits of structure-aware adaptation for efficient, scalable multilingual ASR.

Problem

Research questions and friction points this paper is trying to address.

low-resource languages

multilingual speech recognition

parameter-efficient adaptation

depth-aware adaptation

speech foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth-Aware Adaptation

Low-Resource ASR

Parameter-Efficient Fine-Tuning