RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the poor out-of-distribution (OOD) generalization of automatic speech recognition (ASR) systems for low-resource Romanian by introducing RO-N3WS, a benchmark dataset comprising 126 hours of diverse, real-world speech from news broadcasts, audiobooks, movie dialogues, children’s stories, and podcasts. The work presents the first systematic evaluation of state-of-the-art models—including Whisper and Wav2Vec 2.0—under both zero-shot and fine-tuned settings, comparing their performance on authentic versus synthetic speech. Experimental results demonstrate that fine-tuning with only a small amount of real RO-N3WS data substantially reduces word error rate (WER), significantly outperforming zero-shot baselines. These findings underscore the critical role of data diversity in enhancing OOD generalization for low-resource ASR and establish a reproducible benchmark to advance multilingual, low-resource speech recognition research.

Technology Category

Application Category

📝 Abstract
We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and lightweight deployment.
Problem

Research questions and friction points this paper is trying to address.

low-resource ASR
generalization
out-of-distribution
Romanian speech
speech recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

low-resource ASR
out-of-distribution generalization
multilingual speech benchmark
domain adaptation
expressive TTS
🔎 Similar Papers
No similar papers found.
A
Alexandra Diaconu
Department of Computer Science, University of Bucharest
M
Mădălina Vînaga
Department of Computer Science, University of Bucharest
Bogdan Alexe
Bogdan Alexe
Faculty of Matematics and Computer Science, University of Bucharest
Computer VisionMachine LearningArtificial Intelligence