OLaPh: Optimal Language Phonemizer

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional phonemization methods exhibit low accuracy and poor consistency on proper names, loanwords, abbreviations, and homographs. To address these challenges, OLaPh introduces an integrated text-to-phoneme conversion framework: it constructs a large-scale, multi-source pronunciation dictionary; incorporates NLP-based preprocessing, compound-word segmentation, and rule-based engines; and employs a probabilistic scoring function for multi-strategy decision fusion. Furthermore, OLaPh fine-tunes a large language model using synthetically generated data to improve generalization to out-of-vocabulary words and low-frequency variants. On German and English benchmark datasets, OLaPh significantly outperforms state-of-the-art approaches—particularly on challenging lexical items—achieving substantial gains in phonemization accuracy. The project fully open-sources all models, dictionaries, and code, providing a reproducible, extensible infrastructure for speech synthesis frontend research.

Technology Category

Application Category

📝 Abstract
Phonemization, the conversion of text into phonemes, is a key step in text-to-speech. Traditional approaches use rule-based transformations and lexicon lookups, while more advanced methods apply preprocessing techniques or neural networks for improved accuracy on out-of-domain vocabulary. However, all systems struggle with names, loanwords, abbreviations, and homographs. This work presents OLaPh (Optimal Language Phonemizer), a framework that combines large lexica, multiple NLP techniques, and compound resolution with a probabilistic scoring function. Evaluations in German and English show improved accuracy over previous approaches, including on a challenging dataset. To further address unresolved cases, we train a large language model on OLaPh-generated data, which achieves even stronger generalization and performance. Together, the framework and LLM improve phonemization consistency and provide a freely available resource for future research.
Problem

Research questions and friction points this paper is trying to address.

Improving phonemization accuracy for challenging vocabulary like names and loanwords
Addressing limitations of traditional rule-based and neural phonemization approaches
Enhancing consistency and generalization in text-to-phoneme conversion systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines large lexica with multiple NLP techniques
Uses probabilistic scoring function for compound resolution
Trains LLM on generated data for generalization
🔎 Similar Papers
No similar papers found.