SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Domain shift in realistic scenarios significantly degrades end-to-end automatic speech recognition (ASR) performance. Test-time adaptation (TTA) and external language model (LM) rescoring—two prominent robustness strategies—suffer from inherent conflict, and existing approaches lack an organic integration mechanism. This paper is the first to formally identify and characterize this conflict. We propose an acoustic–linguistic joint-guided automatic step-size selection mechanism that enables synergistic optimization of TTA and LM rescoring. Built upon entropy minimization, our SUTA framework incorporates dynamic adaptive step-size control and integrated LM rescoring. Evaluated across 18 cross-domain ASR benchmarks, our method consistently improves robustness and uniformly reduces word error rate (WER) without requiring additional training, fine-tuning, or labeled data.

Technology Category

Application Category

📝 Abstract

Despite progress in end-to-end ASR, real-world domain mismatches still cause performance drops, which Test-Time Adaptation (TTA) aims to mitigate by adjusting models during inference. Recent work explores combining TTA with external language models, using techniques like beam search rescoring or generative error correction. In this work, we identify a previously overlooked challenge: TTA can interfere with language model rescoring, revealing the nontrivial nature of effectively combining the two methods. Based on this insight, we propose SUTA-LM, a simple yet effective extension of SUTA, an entropy-minimization-based TTA approach, with language model rescoring. SUTA-LM first applies a controlled adaptation process guided by an auto-step selection mechanism leveraging both acoustic and linguistic information, followed by language model rescoring to refine the outputs. Experiments on 18 diverse ASR datasets show that SUTA-LM achieves robust results across a wide range of domains.

Problem

Research questions and friction points this paper is trying to address.

Addressing domain mismatch in ASR via test-time adaptation

Mitigating interference between TTA and language model rescoring

Enhancing ASR robustness across diverse domains with SUTA-LM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines TTA with language model rescoring

Uses auto-step selection for controlled adaptation

Leverages acoustic and linguistic information

🔎 Similar Papers

No similar papers found.

Authors to Follow