ASR Under Noise: Exploring Robustness for Sundanese and Javanese

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This study investigates the noise robustness of the Whisper model for Indonesian Javanese and Sundanese—under-resourced regional languages—addressing a critical gap in automatic speech recognition (ASR) evaluation under realistic noisy conditions. To tackle data scarcity and dialectal variability, we propose a noise-aware training framework integrating multi-SNR synthetic noise augmentation, SpecAugment, and cross-domain data utilization. Experimental results show that noise-aware fine-tuning reduces word error rate (WER) by 28.6% on average across both languages for large-scale Whisper models (e.g., whisper-large), significantly outperforming standard baselines. Error analysis identifies phonological ambiguity and dialectal variation as primary sources of degradation. To our knowledge, this is the first systematic validation that noise robustness from large pre-trained ASR models can be effectively transferred to low-resource Indonesian regional languages. The proposed framework establishes a reusable, robust training paradigm for low-resource speech recognition, with implications for real-world deployment in acoustically challenging environments.

Technology Category

Application Category

📝 Abstract

We investigate the robustness of Whisper-based automatic speech recognition (ASR) models for two major Indonesian regional languages: Javanese and Sundanese. While recent work has demonstrated strong ASR performance under clean conditions, their effectiveness in noisy environments remains unclear. To address this, we experiment with multiple training strategies, including synthetic noise augmentation and SpecAugment, and evaluate performance across a range of signal-to-noise ratios (SNRs). Our results show that noise-aware training substantially improves robustness, particularly for larger Whisper models. A detailed error analysis further reveals language-specific challenges, highlighting avenues for future improvements

Problem

Research questions and friction points this paper is trying to address.

Evaluating Whisper ASR robustness for Javanese and Sundanese languages

Assessing ASR performance degradation in noisy environments

Addressing language-specific challenges through noise-aware training strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using synthetic noise augmentation for training

Applying SpecAugment to improve model robustness

Evaluating performance across various SNR levels

🔎 Similar Papers

Comparative study on noise-augmented training and its effect on adversarial robustness in ASR systems