Coreference Resolution for Vietnamese Narrative Texts

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses coreference resolution for Vietnamese narrative texts under low-resource conditions. We construct the first high-quality, manually annotated Vietnamese coreference dataset—derived from VnExpress news articles—thereby filling a critical gap in annotated resources for this language. Employing standardized annotation guidelines and prompt engineering, we systematically evaluate GPT-3.5-Turbo and GPT-4 under zero-shot and few-shot settings. Experimental results demonstrate that GPT-4 significantly outperforms GPT-3.5-Turbo in both accuracy and response consistency, confirming its viability as a practical tool for Vietnamese coreference resolution. Our key contributions are: (1) the release of the first Vietnamese coreference resolution benchmark specifically designed for narrative text; and (2) the first empirical investigation into the capabilities and limitations of large language models on coreference tasks in low-resource languages, establishing concrete evidence of their applicability and performance boundaries.

Technology Category

Application Category

📝 Abstract
Coreference resolution is a vital task in natural language processing (NLP) that involves identifying and linking different expressions in a text that refer to the same entity. This task is particularly challenging for Vietnamese, a low-resource language with limited annotated datasets. To address these challenges, we developed a comprehensive annotated dataset using narrative texts from VnExpress, a widely-read Vietnamese online news platform. We established detailed guidelines for annotating entities, focusing on ensuring consistency and accuracy. Additionally, we evaluated the performance of large language models (LLMs), specifically GPT-3.5-Turbo and GPT-4, on this dataset. Our results demonstrate that GPT-4 significantly outperforms GPT-3.5-Turbo in terms of both accuracy and response consistency, making it a more reliable tool for coreference resolution in Vietnamese.
Problem

Research questions and friction points this paper is trying to address.

Coreference resolution for Vietnamese narrative texts
Limited annotated datasets for Vietnamese language
Evaluating LLMs (GPT-3.5-Turbo vs GPT-4) performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed annotated dataset from VnExpress news
Evaluated GPT-3.5-Turbo and GPT-4 performance
GPT-4 outperforms GPT-3.5-Turbo significantly
🔎 Similar Papers
No similar papers found.
H
Hieu-Dai Tran
University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam
Duc-Vu Nguyen
Duc-Vu Nguyen
University of Information Technology
Natural Language Processing
Ngan Luu-Thuy Nguyen
Ngan Luu-Thuy Nguyen
University of Information Technology
Natural Language Processing