An Automatic Quality Metric for Evaluating Simultaneous Interpretation

📅 2024-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automatic evaluation metrics (e.g., BLEU, TER) fail to capture source–target word-order synchronization in simultaneous interpreting—especially for language pairs with divergent syntactic structures, such as English–Japanese. This paper introduces the first real-time automatic metric specifically designed to assess word-order synchronization in simultaneous interpretation. It integrates a cross-lingual pre-trained model with Spearman’s rank correlation coefficient to quantify the temporal alignment between source input rhythm and target output sequence, jointly optimizing for latency control and output fluency. Unlike conventional metrics that neglect temporal alignment and rhythmic adaptation, our approach explicitly models dynamic synchronization. Experimental results on the NAIST-SIC-Aligned and JNPC datasets demonstrate statistically significant improvements over baseline metrics, achieving a 32% increase in correlation with human judgments of synchronization quality.

Technology Category

Application Category

📝 Abstract
Simultaneous interpretation (SI), the translation of one language to another in real time, starts translation before the original speech has finished. Its evaluation needs to consider both latency and quality. This trade-off is challenging especially for distant word order language pairs such as English and Japanese. To handle this word order gap, interpreters maintain the word order of the source language as much as possible to keep up with original language to minimize its latency while maintaining its quality, whereas in translation reordering happens to keep fluency in the target language. This means outputs synchronized with the source language are desirable based on the real SI situation, and it's a key for further progress in computational SI and simultaneous machine translation (SiMT). In this work, we propose an automatic evaluation metric for SI and SiMT focusing on word order synchronization. Our evaluation metric is based on rank correlation coefficients, leveraging cross-lingual pre-trained language models. Our experimental results on NAIST-SIC-Aligned and JNPC showed our metrics' effectiveness to measure word order synchronization between source and target language.
Problem

Research questions and friction points this paper is trying to address.

Automatic Evaluation
Simultaneous Interpretation
Word Order Synchronization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronization Evaluation
Pre-trained Language Models
Simultaneous Machine Translation