🤖 AI Summary
This study addresses the lack of a unified theoretical account for systematic deviations of translated texts from native target-language productions—commonly referred to as translationese. It proposes that translationese arises as a rational response by translators to the cognitive load imposed by task difficulty, operationalized through source-text complexity and cross-linguistic transfer difficulty. The authors develop an integrative explanatory model combining information-theoretic measures (LLM-based surprisal), syntactic complexity, semantic features, and translation disentropic signals, and introduce an automated classifier to generate paragraph-level translatedness scores. Experiments on bidirectional English–German corpora demonstrate that task difficulty significantly accounts for translationese, particularly in English-to-German translation, with cross-linguistic transfer difficulty exerting a stronger influence than source-text complexity. Among all predictors, syntactic complexity and translation disentropy emerge as the most robust indicators.
📝 Abstract
Translations systematically diverge from texts originally produced in the target language, a phenomenon widely referred to as translationese. Translationese has been attributed to production tendencies (e.g. interference, simplification), socio-cultural variables, and language-pair effects, yet a unified explanatory account is still lacking. We propose that translationese reflects cognitive load inherent in the translation task itself. We test whether observable translationese can be predicted from quantifiable measures of translation task difficulty. Translationese is operationalised as a segment-level translatedness score produced by an automatic classifier. Translation task difficulty is conceptualised as comprising source-text and cross-lingual transfer components, operationalised mainly through information-theoretic metrics based on LLM surprisal, complemented by established syntactic and semantic alternatives. We use a bidirectional English-German corpus comprising written and spoken subcorpora. Results indicate that translationese can be partly explained by translation task difficulty, especially in English-to-German. For most experiments, cross-lingual transfer difficulty contributes more than source-text complexity. Information-theoretic indicators match or outperform traditional features in written mode, but offer no advantage in spoken mode. Source-text syntactic complexity and translation-solution entropy emerged as the strongest predictors of translationese across language pairs and modes.