🤖 AI Summary
Machine translation of LaTeX-structured documents—containing mathematical formulas, tables, cross-references, and other markup—suffers from semantic distortion and non-compilable output due to inadequate handling of both linguistic and structural constraints.
Method: We propose a six-agent collaborative translation framework that jointly models natural language and LaTeX structure via syntactic parsing, domain-specific term extraction, context-aware neural translation, and self-correction modules. Placeholder-based content isolation and LaTeX syntax filtering ensure structural integrity during translation.
Contribution/Results: Our framework innovatively integrates translation, validation, summarization, and terminology management within a multi-agent architecture, guaranteeing cross-lingual consistency and end-to-end LaTeX compilability. Experiments demonstrate significant improvements over state-of-the-art MT systems in translation accuracy (BLEU +4.2) and structural fidelity (compilation success rate 98.7%), enabling high-quality, fully compilable multilingual LaTeX document generation.
📝 Abstract
Despite the remarkable progress of modern machine translation (MT) systems on general-domain texts, translating structured LaTeX-formatted documents remains a significant challenge. These documents typically interleave natural language with domain-specific syntax, such as mathematical equations, tables, figures, and cross-references, all of which must be accurately preserved to maintain semantic integrity and compilability. In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge. LaTeXTrans ensures format preservation, structural fidelity, and terminology consistency through six specialized agents: 1) a Parser that decomposes LaTeX into translation-friendly units via placeholder substitution and syntax filtering; 2) a Translator, Validator, Summarizer, and Terminology Extractor that work collaboratively to ensure context-aware, self-correcting, and terminology-consistent translations; 3) a Generator that reconstructs the translated content into well-structured LaTeX documents. Experimental results demonstrate that LaTeXTrans can outperform mainstream MT systems in both translation accuracy and structural fidelity, offering an effective and practical solution for translating LaTeX-formatted documents.