ASR Under Noise: Exploring Robustness for Sundanese and Javanese

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the noise robustness of the Whisper model for Indonesian Javanese and Sundanese—under-resourced regional languages—addressing a critical gap in automatic speech recognition (ASR) evaluation under realistic noisy conditions. To tackle data scarcity and dialectal variability, we propose a noise-aware training framework integrating multi-SNR synthetic noise augmentation, SpecAugment, and cross-domain data utilization. Experimental results show that noise-aware fine-tuning reduces word error rate (WER) by 28.6% on average across both languages for large-scale Whisper models (e.g., whisper-large), significantly outperforming standard baselines. Error analysis identifies phonological ambiguity and dialectal variation as primary sources of degradation. To our knowledge, this is the first systematic validation that noise robustness from large pre-trained ASR models can be effectively transferred to low-resource Indonesian regional languages. The proposed framework establishes a reusable, robust training paradigm for low-resource speech recognition, with implications for real-world deployment in acoustically challenging environments.

Technology Category

Application Category

📝 Abstract
We investigate the robustness of Whisper-based automatic speech recognition (ASR) models for two major Indonesian regional languages: Javanese and Sundanese. While recent work has demonstrated strong ASR performance under clean conditions, their effectiveness in noisy environments remains unclear. To address this, we experiment with multiple training strategies, including synthetic noise augmentation and SpecAugment, and evaluate performance across a range of signal-to-noise ratios (SNRs). Our results show that noise-aware training substantially improves robustness, particularly for larger Whisper models. A detailed error analysis further reveals language-specific challenges, highlighting avenues for future improvements
Problem

Research questions and friction points this paper is trying to address.

Evaluating Whisper ASR robustness for Javanese and Sundanese languages
Assessing ASR performance degradation in noisy environments
Addressing language-specific challenges through noise-aware training strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using synthetic noise augmentation for training
Applying SpecAugment to improve model robustness
Evaluating performance across various SNR levels
🔎 Similar Papers
No similar papers found.
S
Salsabila Zahirah Pranida
MBZUAI
M
Muhammad Cendekia Airlangga
MBZUAI
R
Rifo Ahmad Genadi
MBZUAI
Shady Shehata
Shady Shehata
University of Waterloo
Artificial IntelligenceNatural Language Processing