Fotheidil: an Automatic Transcription System for the Irish Language

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor automatic speech recognition (ASR) performance and high manual correction overhead for Irish—a low-resource language—this paper introduces the first open-source, web-based Irish ASR system. Methodologically: (1) a semi-supervised TDNN-HMM acoustic model is designed to enhance robustness on dialectal and out-of-domain speech; (2) an end-to-end Seq2Seq model replaces conventional classification-based approaches to jointly restore capitalization and punctuation; (3) pretrained voice activity detection (VAD) and speaker diarization modules are integrated, alongside a community-driven, incremental data backfilling mechanism. Experiments demonstrate significant improvements on low-resource dialectal and out-of-domain test sets, achieving an 18.7% relative reduction in word error rate (WER). All code, models, and data are publicly released, establishing a scalable infrastructure for digital preservation of endangered languages.

Technology Category

Application Category

📝 Abstract
This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation restoration. Semi-supervised learning is explored to improve the acoustic model of a modular TDNN-HMM ASR system, yielding substantial improvements for out-of-domain test sets and dialects that are underrepresented in the supervised training set. A novel approach to capitalisation and punctuation restoration involving sequence-to-sequence models is compared with the conventional approach using a classification model. Experimental results show here also substantial improvements in performance. The system will be made freely available for public use, and represents an important resource to researchers and others who transcribe Irish language materials. Human-corrected transcriptions will be collected and included in the training dataset as the system is used, which should lead to incremental improvements to the ASR model in a cyclical, community-driven fashion.
Problem

Research questions and friction points this paper is trying to address.

Automatic Transcription
Irish Language
Efficiency Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Artificial Intelligence
Capitalization and Punctuation Accuracy
Community-Enhanced Model Optimization
🔎 Similar Papers
No similar papers found.
Liam Lonergan
Liam Lonergan
Phonetics and Speech Laboratory, Trinity College Dublin
Speech RecognitionSpeech ProcessingLinguisticsIrishDeep Learning
I
I. Saratxaga
HiTZ Basque Center for Language Technology, AhoLab, University of the Basque Country UPV/EHU
J
John Sloan
Phonetics and Speech Laboratory, Trinity College Dublin
O
Oscar Maharog
Phonetics and Speech Laboratory, Trinity College Dublin
Mengjie Qian
Mengjie Qian
University of Cambridge
speech recognitionmachine learningspoken language assessmentlow-resource
N
Neasa N'i Chiar'ain
Phonetics and Speech Laboratory, Trinity College Dublin
C
Christer Gobl
Phonetics and Speech Laboratory, Trinity College Dublin
A
A. N. Chasaide
Phonetics and Speech Laboratory, Trinity College Dublin