CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This study addresses the privacy risks and high costs associated with current machine translation quality evaluation methods that rely on large, closed-source language models. The authors propose a single-prompt strategy leveraging open-source large language models with fewer than 30 billion parameters to simultaneously generate quality scores, MQM error annotations, correction suggestions, and fully edited translations. Experimental results demonstrate that this approach achieves evaluation outcomes highly correlated with human judgments while preserving data privacy and substantially reducing computational costs. Its performance rivals that of large closed-source models and surpasses conventional neural metrics, fine-tuned models, and even inter-annotator agreement among human evaluators, offering a highly efficient and interpretable alternative for translation quality assessment.

📝 Abstract

Current state-of-the-art Quality Estimation (QE) in machine translation relies on massive, proprietary LLMs, raising data privacy concerns. We demonstrate that smaller, open-source LLMs (<30B parameters) are a viable, cost-effective and privacy-preserving alternative. Using a single-pass prompting strategy, our models simultaneously generate quality scores, MQM error annotations, suggested error corrections, and full post-editions. Our analysis shows these models achieve highly competitive system-level correlations with human judgments that outperform traditional neural metrics, fine-tuned models, and human inter-annotator agreement, effectively approximating the capabilities of much larger proprietary LLMs.

Problem

Research questions and friction points this paper is trying to address.

Quality Estimation

Machine Translation

Data Privacy

Large Language Models

Open-Weight Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quality Estimation

Open-weight LLMs

Single-pass Prompting