🤖 AI Summary
Accurately translating ultra-long literary texts remains challenging due to the difficulty in preserving metaphors, culturally embedded meanings, and authorial style. Method: This paper proposes TransAgents, a multi-agent collaborative framework inspired by professional translation companies, comprising five specialized roles—CEO, Translator, Editor, Localization Specialist, and Proofreader—operating in two sequential phases: preparation and execution. It introduces a novel two-stage coordination paradigm and a hybrid evaluation system integrating monolingual human preferences (MHP) and bilingual large-model preferences (BLP), overcoming limitations of traditional metrics like BLEU. Contribution/Results: Experiments demonstrate that TransAgents significantly outperforms GPT-4 and human reference translations in cultural adaptation, stylistic consistency, and overall quality. Both human evaluators and LLMs consistently prefer its outputs, empirically validating multi-agent collaboration as an effective approach to enhancing literary translation quality.
📝 Abstract
Literary translation remains one of the most challenging frontiers in machine translation due to the complexity of capturing figurative language, cultural nuances, and unique stylistic elements. In this work, we introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company, including a CEO, Senior Editor, Junior Editor, Translator, Localization Specialist, and Proofreader. The translation process is divided into two stages: a preparation stage where the team is assembled and comprehensive translation guidelines are drafted, and an execution stage that involves sequential translation, localization, proofreading, and a final quality check. Furthermore, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP), which evaluates translations based solely on target language quality and cultural appropriateness, and Bilingual LLM Preference (BLP), which leverages large language models like GPT-4} for direct text comparison. Although TransAgents achieves lower d-BLEU scores, due to the limited diversity of references, its translations are significantly better than those of other baselines and are preferred by both human evaluators and LLMs over traditional human references and GPT-4} translations. Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.