🤖 AI Summary
This work addresses the limitations of traditional machine translation, which treats translation as a fixed mapping and overlooks the impact of audience, tone, and communicative intent on output quality. We propose a purpose-aware translation framework and present the first systematic evaluation of explicit instruction-driven adaptation across a large-scale multilingual setting encompassing 50 languages, five model scales, and eight textual domains. By integrating large language models, few-shot examples, and contextual analysis, we introduce a self-generated instruction method and develop dedicated metrics for measuring adaptability. Experimental results demonstrate that explicit instructions significantly enhance translation adaptability—particularly in informal domains, with larger models, and for high-resource languages—and that self-generated instructions can recover up to 80% of the adaptability gap. Moreover, conventional automatic metrics often fail to accurately reflect adaptability quality.
📝 Abstract
Translation quality depends on purpose: the same source text demands different translations depending on audience, tone, and communicative intent. Yet MT models and metrics treat translation as a fixed mapping from source to target. LLMs enable users to explicitly specify purpose alongside source text, yet this capability has not been evaluated at scale. We introduce a systematic evaluation of purpose-driven MT across 50 languages, 5 model sizes and 8 text domains. We find that (1) explicit instructions substantially improve translation adaptedness, with larger gains on informal domains (conversation, social media), for larger model sizes and for higher-resource languages; (2) instructions outperform semantically-matched few-shot examples and paragraph-level context; (3) traditional MT metrics fail to capture adaptation quality, often penalizing adapted translations; (4) when curated instructions are unavailable, models can self-generate them from surrounding document context, closing up to 80% of the adaptedness gap to curated instructions. Our results establish that purpose-adapted MT is a viable and measurable capability of LLMs, while highlighting the need for purpose-aware metrics.