LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with a Refined Sylheti-English Benchmark

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating machine translation for low-resource dialect languages—such as Sylheti—is hindered by the scarcity of high-quality reference translations and interference from intra-dialect linguistic diversity. Method: We propose a reference-free evaluation paradigm leveraging large language models (LLMs), incorporating dialect-aware prompting and a vocabulary-enhanced tokenizer to better capture dialectal context; we further introduce a regression head for fine-grained scalar score prediction. Contribution/Results: We construct the first native, human-annotated, reference-free benchmark for Sylheti–English translation evaluation. Experiments across multiple LLMs demonstrate that our method achieves up to a 0.1083 improvement in Spearman correlation over state-of-the-art reference-free approaches, confirming its effectiveness. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Evaluating machine translation (MT) for low-resource languages poses a persistent challenge, primarily due to the limited availability of high quality reference translations. This issue is further exacerbated in languages with multiple dialects, where linguistic diversity and data scarcity hinder robust evaluation. Large Language Models (LLMs) present a promising solution through reference-free evaluation techniques; however, their effectiveness diminishes in the absence of dialect-specific context and tailored guidance. In this work, we propose a comprehensive framework that enhances LLM-based MT evaluation using a dialect guided approach. We extend the ONUBAD dataset by incorporating Sylheti-English sentence pairs, corresponding machine translations, and Direct Assessment (DA) scores annotated by native speakers. To address the vocabulary gap, we augment the tokenizer vocabulary with dialect-specific terms. We further introduce a regression head to enable scalar score prediction and design a dialect-guided (DG) prompting strategy. Our evaluation across multiple LLMs shows that the proposed pipeline consistently outperforms existing methods, achieving the highest gain of +0.1083 in Spearman correlation, along with improvements across other evaluation settings. The dataset and the code are available at https://github.com/180041123-Atiq/MTEonLowResourceLanguage.
Problem

Research questions and friction points this paper is trying to address.

Evaluating low-resource machine translation without references
Addressing dialect diversity in language model evaluations
Improving LLM-based assessment with dialect-specific guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based reference-free evaluation for low-resource MT
Dialect-guided prompting strategy with augmented vocabulary
Regression head for scalar score prediction
🔎 Similar Papers
No similar papers found.