MLMA: Towards Multilingual with Mamba Based Architectures

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual automatic speech recognition (ASR) suffers from performance disparity between high-resource and low-resource languages. To address this, we propose the first integration of the Mamba state space model into multilingual ASR, replacing the conventional Transformer architecture. Leveraging Mamba’s linear-time complexity and superior long-range dependency modeling, our approach establishes an efficient, scalable, unified ASR framework. Crucially, it incorporates an implicit language-aware mechanism and shared cross-lingual representations to significantly improve modeling of low-resource languages. Evaluated on standard multilingual benchmarks—including MLS and CommonVoice—our method achieves competitive word error rates relative to state-of-the-art Transformer-based models, while accelerating inference by approximately 2.3× and reducing GPU memory consumption by 40%. This work introduces a novel paradigm for efficient and equitable multilingual ASR, advancing both computational efficiency and linguistic fairness.

Technology Category

Application Category

📝 Abstract
Multilingual automatic speech recognition (ASR) remains a challenging task, especially when balancing performance across high- and low-resource languages. Recent advances in sequence modeling suggest that architectures beyond Transformers may offer better scalability and efficiency. In this work, we introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new approach that leverages the Mamba architecture--an efficient state-space model optimized for long-context sequence processing--for multilingual ASR. Using Mamba, MLMA implicitly incorporates language-aware conditioning and shared representations to support robust recognition across diverse languages. Experiments on standard multilingual benchmarks show that MLMA achieves competitive performance compared to Transformer-based architectures. These results highlight Mamba's potential as a strong backbone for scalable, efficient, and accurate multilingual speech recognition.
Problem

Research questions and friction points this paper is trying to address.

Addressing multilingual ASR performance imbalance across resource levels
Exploring Mamba architecture for scalable multilingual speech recognition
Enhancing cross-linguistic robustness through shared language representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mamba architecture for multilingual ASR
Implicitly incorporates language-aware conditioning mechanisms
Provides efficient state-space model for long sequences
🔎 Similar Papers
No similar papers found.
M
Mohamed Nabih Ali
Center for Augmented Intelligence, Fondazione Bruno Kessler, Trento, Italy
D
Daniele Falavigna
Center for Augmented Intelligence, Fondazione Bruno Kessler, Trento, Italy
Alessio Brutti
Alessio Brutti
FBK
audio/speech processing