RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the poor out-of-distribution (OOD) generalization of automatic speech recognition (ASR) systems for low-resource Romanian by introducing RO-N3WS, a benchmark dataset comprising 126 hours of diverse, real-world speech from news broadcasts, audiobooks, movie dialogues, children’s stories, and podcasts. The work presents the first systematic evaluation of state-of-the-art models—including Whisper and Wav2Vec 2.0—under both zero-shot and fine-tuned settings, comparing their performance on authentic versus synthetic speech. Experimental results demonstrate that fine-tuning with only a small amount of real RO-N3WS data substantially reduces word error rate (WER), significantly outperforming zero-shot baselines. These findings underscore the critical role of data diversity in enhancing OOD generalization for low-resource ASR and establish a reproducible benchmark to advance multilingual, low-resource speech recognition research.

Technology Category

Application Category

📝 Abstract

We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and lightweight deployment.

Problem

Research questions and friction points this paper is trying to address.

low-resource ASR

generalization

out-of-distribution

Romanian speech

speech recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

low-resource ASR

out-of-distribution generalization

multilingual speech benchmark

domain adaptation

expressive TTS

🔎 Similar Papers

No similar papers found.

Authors to Follow