Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the limited generalization of current code-switching automatic speech recognition (CS-ASR) systems to unseen language pairs, a challenge exacerbated by the scarcity of multilingual mixed-language data and the combinatorial explosion of possible language pairs. For the first time, this work systematically evaluates the cross-language-pair generalization capability of CS-ASR models and introduces a novel paradigm that integrates model fusion with domain generalization. By fusing multiple bilingual CS-ASR models without relying on pairwise fine-tuning or synthetic data, the proposed approach enhances scalability and achieves measurable generalization to unseen language pairs. Experimental results validate the feasibility of cross-lingual transfer in CS-ASR while also revealing significant limitations in transferring bilingual code-switching capabilities across language pairs.

📝 Abstract

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS must be developed separately for language pairs whose number grows combinatorially with the number of supported languages. In this work, we investigate whether CS capabilities learned from a limited set of seen language pairs can generalize to unseen language pairs through model merging and domain generalization methods. Our experiments show that merged bilingual CS-ASR models modestly generalize to unseen language pairs, suggesting limited transfer of bilingual CS capabilities across language pairs.

Problem

Research questions and friction points this paper is trying to address.

code-switching ASR

unseen language pairs

multilingual speech recognition

scalability

language generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

code-switching ASR

model merging

domain generalization