🤖 AI Summary
This study investigates whether prompt engineering alone can approach the performance of fine-tuned large language models on Ukrainian minimal-edit grammatical error correction (GEC). We systematically evaluate twelve large language models—including eleven commercial systems and one open-source Ukrainian-specific model—using zero-shot and few-shot prompting, minimal-edit constraints, and model-assisted prompt optimization, enhanced with linguistically informed instructions grounded in Ukrainian grammar. Our work provides the first comprehensive validation of prompt engineering’s efficacy for Ukrainian GEC, revealing its strong dependence on prompt language and identifying five distinct overcorrection patterns tied to Ukrainian linguistic characteristics. The best-performing configuration, Gemini 1.5 Pro, achieves an F0.5 score of 69.22 on the UNLP 2023 benchmark, closing over 90% of the performance gap with the current fine-tuned state-of-the-art model.
📝 Abstract
Fine-tuned Large Language Models (LLMs) dominate in Ukrainian grammatical error correction (GEC), while API-accessed LLMs remain nearly untested on minimal-edit benchmarks. We evaluate 11 commercial LLMs from four providers and one open-source Ukrainian model on the UNLP 2023 GEC-only benchmark, comparing zero-shot, few-shot, minimal-edits, and LLM-assisted prompt optimization strategies. Our best configuration (Gemini 3.1-Pro) reaches F0.5=69.22, closing over 90% of the gap to fine-tuned SOTA (F0.5=73.14). For zero-shot prompts, only Claude models benefit from Ukrainian instructions. However, the best overall results for all models use Ukrainian minimal-edits prompts, whose language-specific rules require Ukrainian to express precisely. LLM-assisted prompt optimization on top of minimal-edits + few-shot achieves the highest score. Detailed minimal-edits instructions yield the largest gains for punctuation and case errors but cause the model to abandon several low-frequency categories. Delving into error analysis, we identify five recurring overcorrection patterns tied to Ukrainian-specific linguistic phenomena. Code, prompts, and outputs are publicly available.