Refining Word-Based Grammatical Error Annotation for L2 Korean

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

142K/year
🤖 AI Summary
This study addresses the structural mismatch between word-level evaluation and morpheme-level error localization in Korean grammatical error correction, as well as limitations in existing resources stemming from single-reference constraints, suboptimal target sentence quality, and coarse annotation granularity. To resolve these issues, the authors reconstruct target sentences in the NIKL corpus by incorporating morphological constraints to generate more natural corrections and convert original morpheme-level annotations into word-level m2 format. They further develop an ERRANT-style fine-grained annotation framework tailored to Korean morphology, spacing conventions, and correction diversity, and enrich KoLLA with multiple reference translations to enable multi-reference evaluation. Experimental results demonstrate that the proposed framework significantly improves source–target edit consistency and enhances KoBART’s correction performance, effectively mitigating penalization of valid yet divergent corrections that differ from a single reference.
📝 Abstract
Korean grammatical error correction (K-GEC) presents a structural mismatch between word-based evaluation and the morpheme-level locus of many learner errors. Postpositions and verbal endings are bound to lexical hosts, but they encode grammatical relations that must be represented in correction and evaluation. This paper refines word-based grammatical error annotation for L2 Korean by addressing three connected problems in existing resources: surface target realization, Korean-specific edit annotation, and single-reference evaluation. We reconstruct target sentences from the National Institute of Korean Language (NIKL) L2 corpus under morphologically constrained realization rules and convert its morpheme-level annotations into word-level \texttt{m2} edits. We then define a Korean ERRANT-style annotation scheme that preserves the MRU core while distinguishing functional morpheme errors, spelling errors, word boundary errors, and word order errors. We also augment the KoLLA corpus with an additional reference correction, yielding a multi-reference evaluation setting for Korean GEC. Empirical validation shows that the refined NIKL targets yield lower perplexity, the converted \texttt{m2} files achieve higher agreement with source-target edit representations, and the refined resources improve KoBART-based correction under the same model setting. Multi-reference KoLLA evaluation further reduces the penalty imposed on valid corrections that diverge from a single reference, especially for neural and prompted GEC systems. These results show that Korean GEC evaluation depends not only on correction models, but also on reference data and edit annotations that reflect Korean morphology, spacing, and correction variability.
Problem

Research questions and friction points this paper is trying to address.

Korean grammatical error correction
morpheme-level errors
word-based annotation
multi-reference evaluation
L2 Korean
Innovation

Methods, ideas, or system contributions that make the work stand out.

Korean GEC
morpheme-level annotation
word-based m2 edits
multi-reference evaluation
ERRANT-style annotation
🔎 Similar Papers
No similar papers found.
J
Jungyeul Park
Korea Advanced Institute of Science & Technology, South Korea
K
Kyungtae Lim
Korea Advanced Institute of Science & Technology, South Korea
W
Wonjun Oh
Korea Advanced Institute of Science & Technology, South Korea
Benjamin Nguyen
Benjamin Nguyen
Professor, INSA Centre Val de Loire, Inria
DatabasesPrivacy
Z
Zihao Huang
The University of British Columbia, Canada
M
Mengyang Qiu
Saint Elizabeth University, USA
J
Jayoung Song
The Pennsylvania State University, USA