Coreference Resolution for Vietnamese Narrative Texts

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses coreference resolution for Vietnamese narrative texts under low-resource conditions. We construct the first high-quality, manually annotated Vietnamese coreference dataset—derived from VnExpress news articles—thereby filling a critical gap in annotated resources for this language. Employing standardized annotation guidelines and prompt engineering, we systematically evaluate GPT-3.5-Turbo and GPT-4 under zero-shot and few-shot settings. Experimental results demonstrate that GPT-4 significantly outperforms GPT-3.5-Turbo in both accuracy and response consistency, confirming its viability as a practical tool for Vietnamese coreference resolution. Our key contributions are: (1) the release of the first Vietnamese coreference resolution benchmark specifically designed for narrative text; and (2) the first empirical investigation into the capabilities and limitations of large language models on coreference tasks in low-resource languages, establishing concrete evidence of their applicability and performance boundaries.

Technology Category

Application Category

📝 Abstract

Coreference resolution is a vital task in natural language processing (NLP) that involves identifying and linking different expressions in a text that refer to the same entity. This task is particularly challenging for Vietnamese, a low-resource language with limited annotated datasets. To address these challenges, we developed a comprehensive annotated dataset using narrative texts from VnExpress, a widely-read Vietnamese online news platform. We established detailed guidelines for annotating entities, focusing on ensuring consistency and accuracy. Additionally, we evaluated the performance of large language models (LLMs), specifically GPT-3.5-Turbo and GPT-4, on this dataset. Our results demonstrate that GPT-4 significantly outperforms GPT-3.5-Turbo in terms of both accuracy and response consistency, making it a more reliable tool for coreference resolution in Vietnamese.

Problem

Research questions and friction points this paper is trying to address.

Coreference resolution for Vietnamese narrative texts

Limited annotated datasets for Vietnamese language

Evaluating LLMs (GPT-3.5-Turbo vs GPT-4) performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed annotated dataset from VnExpress news

Evaluated GPT-3.5-Turbo and GPT-4 performance

GPT-4 outperforms GPT-3.5-Turbo significantly

🔎 Similar Papers

No similar papers found.

Authors to Follow