LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Machine translation of LaTeX-structured documents—containing mathematical formulas, tables, cross-references, and other markup—suffers from semantic distortion and non-compilable output due to inadequate handling of both linguistic and structural constraints. Method: We propose a six-agent collaborative translation framework that jointly models natural language and LaTeX structure via syntactic parsing, domain-specific term extraction, context-aware neural translation, and self-correction modules. Placeholder-based content isolation and LaTeX syntax filtering ensure structural integrity during translation. Contribution/Results: Our framework innovatively integrates translation, validation, summarization, and terminology management within a multi-agent architecture, guaranteeing cross-lingual consistency and end-to-end LaTeX compilability. Experiments demonstrate significant improvements over state-of-the-art MT systems in translation accuracy (BLEU +4.2) and structural fidelity (compilation success rate 98.7%), enabling high-quality, fully compilable multilingual LaTeX document generation.

Technology Category

Application Category

📝 Abstract
Despite the remarkable progress of modern machine translation (MT) systems on general-domain texts, translating structured LaTeX-formatted documents remains a significant challenge. These documents typically interleave natural language with domain-specific syntax, such as mathematical equations, tables, figures, and cross-references, all of which must be accurately preserved to maintain semantic integrity and compilability. In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge. LaTeXTrans ensures format preservation, structural fidelity, and terminology consistency through six specialized agents: 1) a Parser that decomposes LaTeX into translation-friendly units via placeholder substitution and syntax filtering; 2) a Translator, Validator, Summarizer, and Terminology Extractor that work collaboratively to ensure context-aware, self-correcting, and terminology-consistent translations; 3) a Generator that reconstructs the translated content into well-structured LaTeX documents. Experimental results demonstrate that LaTeXTrans can outperform mainstream MT systems in both translation accuracy and structural fidelity, offering an effective and practical solution for translating LaTeX-formatted documents.
Problem

Research questions and friction points this paper is trying to address.

Translating structured LaTeX documents with mixed natural language and domain-specific syntax
Preserving semantic integrity and compilability of LaTeX elements during translation
Ensuring format preservation, structural fidelity, and terminology consistency in translations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for structured translation
Specialized agents ensure format and terminology
Placeholder substitution for syntax preservation
🔎 Similar Papers
No similar papers found.
Z
Ziming Zhu
School of Computer Science and Engineering, Northeastern University, Shenyang, China
C
Chenglong Wang
School of Computer Science and Engineering, Northeastern University, Shenyang, China
S
Shunjie Xing
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Yifu Huo
Yifu Huo
Northeastern University
F
Fengning Tian
NiuTrans Research, Shenyang, China
Q
Quan Du
NiuTrans Research, Shenyang, China
D
Di Yang
School of Computer Science and Engineering, Northeastern University, Shenyang, China; NiuTrans Research, Shenyang, China
C
Chunliang Zhang
School of Computer Science and Engineering, Northeastern University, Shenyang, China; NiuTrans Research, Shenyang, China
T
Tong Xiao
School of Computer Science and Engineering, Northeastern University, Shenyang, China; NiuTrans Research, Shenyang, China
Jingbo Zhu
Jingbo Zhu
Northeastern University, China
Machine TranslationLanguage ParsingNatural Language Processing