Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the problem of source-free hallucinations in neural machine translation and abstractive summarization by proposing an unsupervised detection method based on optimal transport (OT). The approach identifies hallucinations by measuring the geometric distance between cross-attention distributions in decoder layers and a reference distribution. The study reveals the complementary roles of Wass-to-Unif and Wass-to-Data metrics in detecting different types of hallucinations, and finds that hallucinatory signals are concentrated in layers L1–L4, with faithful generation typically accompanied by exploratory attention patterns early in decoding. Experiments demonstrate the method’s effectiveness in identifying hallucinations in DE–EN translation and achieving balanced accuracies of 57.2% and 57.6% on CNN/DailyMail and XSum summarization tasks, respectively. Although it underperforms supervised baselines, the work clarifies the applicability and limitations of OT as an interpretability tool.

📝 Abstract

Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend this analysis to all six decoder layers of the Fairseq DE-EN model ($N=3{,}414$), showing that Wass-to-Unif and Wass-to-Data are complementary detectors specialised across hallucination types, that detection is concentrated in layers L1--L4 with L5 anti-predictive for subtler types, and that hallucinated translations lack the exploratory attention phase present in correct translations from the first decoding step. We further evaluate whether the geometric signal transfers to abstractive summarization faithfulness detection: our unsupervised OT detector on AggreFact ($N=1{,}116$) achieves $57.2\%$/$57.6\%$ balanced accuracy on CNN/XSum -- above chance but substantially below supervised MiniCheck-Flan-T5-L($69.9\%$/$74.3\%$). This gap is principled: unlike NMT hallucinations, unfaithful summaries can attend correctly to source tokens while misrepresenting their content, a failure mode invisible to concentration-based OT metrics by construction. Structural experiments on T5-base confirm consistent decoder organisation across depth, with Layer~3 showing peak concentration and Layer~12 being most critical for generation quality. Together, the results establish OT on cross-attention as a reliable detector when the failure mode is source disengagement, a principled interpretability tool regardless of task, and fundamentally limited when faithfulness failures occur downstream of attention.

Problem

Research questions and friction points this paper is trying to address.

hallucination detection

neural machine translation

abstractive summarization

optimal transport

cross-attention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport

Hallucination Detection

Cross-Attention Analysis