Improving Code Switching with Supervised Fine Tuning and GELU Adapters

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
To address the scarcity of multilingual code-switched speech data and the difficulty of directly transferring monolingual large models to low-resource code-switching ASR, this paper proposes a two-stage adaptation framework. First, language-aware “switching tokenizers” leverage Whisper’s monolingual tokenization capability to perform fine-grained language-boundary segmentation on code-switched text. Second, lightweight GELU-activated adapters are inserted into Whisper’s encoder and optimized via supervised fine-tuning for efficient transfer. Crucially, the method requires no multilingual pretraining and is the first to precisely adapt monolingual large models to code-switching ASR. Evaluated on ASCEND, SEAME devman, and devsge test sets, it achieves mixed error rates (MER) of 9.4%, 6.0%, and 9.7%, respectively—substantially outperforming existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called"Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing lack of labeled code-switching datasets for ASR
Improving transcription accuracy using Whisper's monolingual tokenization
Reducing Mixed Error Rate with GELU adapters and tokenizer methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Whisper's monolingual tokenization for code switching
Combines tokenizer method with GELU adapter training
Reduces error rates below current state-of-the-art
🔎 Similar Papers
No similar papers found.