🤖 AI Summary
This work addresses the challenge of building automatic speech recognition (ASR) systems for critically endangered languages—such as Manx Gaelic and Cornish—that lack sentence-level annotated speech corpora. We propose a novel paradigm that substitutes conventional conversational speech data with spoken pronunciation dictionaries. Methodologically, we employ an end-to-end ASR architecture enhanced by dictionary-based forced alignment and acoustic model fine-tuning, specifically optimized for short, non-contiguous, and low-volume audio inputs. Experiments demonstrate that a functional ASR system can be constructed using only ~40 minutes of pronunciation dictionary audio—without requiring manual text transcriptions or time-aligned annotations—achieving word error rates below 50% on both languages. This approach substantially lowers the data requirements for endangered-language ASR and constitutes the first empirical validation of pronunciation dictionaries as sole training data in ultra-low-resource settings. It provides a scalable technical pathway for ASR development across thousands of under-resourced languages lacking spoken corpora.
📝 Abstract
Nearly half of the world's languages are endangered. Speech technologies such as Automatic Speech Recognition (ASR) are central to revival efforts, yet most languages remain unsupported because standard pipelines expect utterance-level supervised data. Speech data often exist for endangered languages but rarely match these formats. Manx Gaelic ($sim$2,200 speakers), for example, has had transcribed speech since 1948, yet remains unsupported by modern systems. In this paper, we explore how little data, and in what form, is needed to build ASR for critically endangered languages. We show that a short-form pronunciation resource is a viable alternative, and that 40 minutes of such data produces usable ASR for Manx ($<$50% WER). We replicate our approach, applying it to Cornish ($sim$600 speakers), another critically endangered language. Results show that the barrier to entry, in quantity and form, is far lower than previously thought, giving hope to endangered language communities that cannot afford to meet the requirements arbitrarily imposed upon them.