🤖 AI Summary
This study investigates whether machine translation preserves the complexity of source texts and examines the relationship between textual complexity and translation difficulty. Building upon the Common European Framework of Reference for Languages (CEFR), the authors propose the first evaluation framework to analyze the interaction between text complexity and machine translation performance. They assess a range of open-source, closed-source, and commercial translation systems across six languages in terms of their ability to maintain or shift CEFR-based complexity levels. The findings reveal that texts at higher CEFR levels are more challenging to translate accurately, and that translated outputs frequently exhibit significant deviations from the original CEFR complexity ratings across most languages. This work provides novel empirical insights and quantitative methods for estimating translation difficulty and generating multilingual educational content.
📝 Abstract
When a text is translated, does the translation retain the complexity of the original? We introduce ComplexityMT, a new challenge for assessing how text complexity and machine translation interact with and influence each other, using the Common European Framework of Reference for Languages (CEFR) levels as the measure of text complexity. Across six languages, including Arabic, Dutch, English, French, Hindi, and Russian, we evaluate three open-weight models, one closed model, and a commercial machine translation system on two tasks: i) correlation of CEFR with translation difficulty, and ii) shifts in CEFR levels of the source texts. Our experiments show that higher CEFR levels make texts more difficult to translate, and that machine translation shifts the CEFR level of the target text compared to the original source, for most languages. These findings provide new insights for researchers and practitioners working on multilingual pedagogical content generation and machine translation difficulty estimation.