Automatic Labelling of Speech Translation Errors

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

149K/year
🤖 AI Summary
Current speech translation systems lack standardized methods for automatic error annotation and quality evaluation, which hinders their reliability and optimization. This work proposes the Speech Translation Error Labeling (STEL) task and introduces the first end-to-end dataset derived from real-world speech translation outputs, accompanied by a fine-grained human annotation protocol. Experimental results demonstrate that combining the text-based quality estimation model XCOMET with the multimodal large language model Qwen2.5-Omni achieves annotation accuracy approximately half that of human annotators on the STEL task, validating the efficacy of multimodal approaches. Furthermore, the analysis reveals complementary error patterns between speech- and text-based systems across different error types, underscoring the necessity of direct speech processing in translation pipelines.
📝 Abstract
Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translations. To initiate progress in this direction, we propose Speech Translation Error Labelling (STEL). We create an annotation protocol, a small authentic end-to-end evaluation dataset, and we analyse how existing text-only and speech-processing systems perform the STEL task. Our results show that text-only XCOMET and multimodal LLM Qwen2.5-Omni are able to perform the STEL task in roughly half the precision of humans. We also find that direct speech processing is necessary for the STEL task, and that the current text-only and speech-processing systems are complementary in labelling translation-only vs. speech-processing errors in ST.
Problem

Research questions and friction points this paper is trying to address.

Speech Translation
Error Labelling
Quality Estimation
Confidence Estimation
Automatic Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech Translation Error Labelling
confidence estimation
multimodal LLM
annotation protocol
end-to-end evaluation