Just Use XML: Revisiting Joint Translation and Label Projection

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes LabelPigeon, a novel framework that challenges the prevailing assumption that joint training of machine translation and label projection degrades translation quality. Unlike conventional cross-lingual label transfer approaches that decouple translation from label projection—thereby compromising both translation fidelity and transfer efficiency—LabelPigeon integrates these tasks through XML-tag-aware modeling. The framework leverages XML tag embeddings, end-to-end fine-tuning, a multilingual Transformer architecture, and a direct evaluation mechanism tailored for label projection. Evaluated on named entity recognition across 27 languages, LabelPigeon achieves up to a 39.9 F1-score improvement over prior methods, while also delivering consistent gains in translation quality across 203 languages.

Technology Category

Application Category

📝 Abstract
Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.
Problem

Research questions and friction points this paper is trying to address.

label projection
cross-lingual transfer
machine translation
span annotation
low-resource languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

XML-tagged label projection
joint translation and projection
cross-lingual transfer
LabelPigeon
machine translation
🔎 Similar Papers
No similar papers found.
T
Thennal D K
Language Technology Group, University of Hamburg
Chris Biemann
Chris Biemann
Professor for Language Technology, University of Hamburg
language technologynatural language processingcomputational linguisticsinformation retrievalcognitive computing
H
Hans Ole Hatzel
Language Technology Group, University of Hamburg