A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR

📅 2026-01-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenges of deploying large-scale multilingual speech recognition models on edge devices, which typically suffer from high computational costs and reliance on explicit language identifiers. The authors propose a lightweight, end-to-end CTC architecture that integrates an mHuBERT backbone with a hierarchical LoRA-MoE module. Language-agnostic single-pass decoding is achieved through a language identification (LID) posterior-driven dynamic routing mechanism, eliminating the need for prior language labels. This approach effectively balances shared and language-specific representations by adaptively fusing expert modules. Evaluated on the MSR-86K and MLC-SLM 2025 Challenge datasets, the method matches the performance of state-of-the-art two-stage systems while significantly improving inference efficiency and reducing resource requirements.

Technology Category

Application Category

📝 Abstract

Large-scale multilingual ASR (mASR) models such as Whisper achieve strong performance but incur high computational and latency costs, limiting their deployment on resource-constrained edge devices. In this study, we propose a lightweight and language-agnostic multilingual ASR system based on a CTC architecture with domain adaptation. Specifically, we introduce a Language-agnostic Hierarchical LoRA-MoE (HLoRA) framework integrated into an mHuBERT-CTC model, enabling end-to-end decoding via LID-posterior-driven LoRA routing. The hierarchical design consists of a multilingual shared LoRA for learning language-invariant acoustic representations and language-specific LoRA experts for modeling language-dependent characteristics. The proposed routing mechanism removes the need for prior language identity information or explicit language labels during inference, achieving true language-agnostic decoding. Experiments on MSR-86K and the MLC-SLM 2025 Challenge datasets demonstrate that HLoRA achieves competitive performance with state-of-the-art two-stage inference methods using only single-pass decoding, significantly improving decoding efficiency for low-resource mASR applications.

Problem

Research questions and friction points this paper is trying to address.

multilingual ASR

computational efficiency

language-agnostic decoding

edge deployment

latency reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-Agnostic

Hierarchical LoRA-MoE

CTC-based ASR