🤖 AI Summary
Existing document-level machine translation (DMT) struggles to model authentic discourse structures: heuristic segmentation decouples sentence/segment boundaries from semantic dependencies, resulting in poor cross-sentence and cross-paragraph coherence. To address this, we propose a graph-structured multi-agent framework that explicitly models document-level semantic flow via a directed acyclic graph (DAG). Leveraging large language models (LLMs), collaborative agents perform dynamic segmentation, discourse dependency identification, and context-aware translation—enabling end-to-end discourse-aware modeling. The framework integrates graph neural networks (GNNs) with dynamic context aggregation, eliminating reliance on hand-crafted rules and associated biases. Extensive experiments across eight translation directions and six domains demonstrate consistent improvements: +2.8 dBLEU on average over the TED test set, including +2.3 dBLEU for English–Chinese. Results confirm substantial gains in both coherence and accuracy of translated documents.
📝 Abstract
Document level Machine Translation (DocMT) approaches often struggle with effectively capturing discourse level phenomena. Existing approaches rely on heuristic rules to segment documents into discourse units, which rarely align with the true discourse structure required for accurate translation. Otherwise, they fail to maintain consistency throughout the document during translation. To address these challenges, we propose Graph Augmented Agentic Framework for Document Level Translation (GRAFT), a novel graph based DocMT system that leverages Large Language Model (LLM) agents for document translation. Our approach integrates segmentation, directed acyclic graph (DAG) based dependency modelling, and discourse aware translation into a cohesive framework. Experiments conducted across eight translation directions and six diverse domains demonstrate that GRAFT achieves significant performance gains over state of the art DocMT systems. Specifically, GRAFT delivers an average improvement of 2.8 d BLEU on the TED test sets from IWSLT2017 over strong baselines and 2.3 d BLEU for domain specific translation from English to Chinese. Moreover, our analyses highlight the consistent ability of GRAFT to address discourse level phenomena, yielding coherent and contextually accurate translations.