Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the out-of-vocabulary (OOV) problem in Korean NLP, this paper proposes KOPL, a novel framework that exploits the highly regular phoneme–grapheme correspondence in Korean to jointly model phonemic and word-level representations. KOPL performs phoneme segmentation and learns phoneme-level embeddings, which are seamlessly integrated into both static and contextualized word embedding models, enabling plug-and-play deployment. Its core contribution is the first systematic incorporation of pronunciation information to enhance Korean word vector representations, thereby capturing both orthographic and phonological semantics. Evaluated on multiple Korean downstream tasks—including part-of-speech tagging, named entity recognition, and dependency parsing—KOPL achieves an average improvement of 1.9% over prior state-of-the-art methods. This work establishes a scalable, multimodal paradigm for OOV modeling in low-resource languages.

Technology Category

Application Category

📝 Abstract
In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.
Problem

Research questions and friction points this paper is trying to address.

Handling Korean OOV words using phoneme representation learning
Improving Korean NLP tasks with phoneme and text integration
Enhancing existing Korean embedding models in plug-and-play manner
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phoneme representation learning for Korean OOV words
Combines phoneme and word representations
Plug-and-play integration with existing models
🔎 Similar Papers
No similar papers found.