TUM-MiKaNi at SemEval-2025 Task 3: Towards Multilingual and Knowledge-Aware Non-factual Hallucination Identification

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of detecting non-factual hallucinations in large language models (LLMs) across multilingual settings, this paper proposes a knowledge-enhanced two-stage detection method. The first stage performs cross-lingual fact verification via Wikipedia-based retrieval; the second stage fine-tunes a multilingual BERT model to learn language-agnostic hallucination patterns. This work is the first to deeply integrate knowledge-aware factual validation with multilingual hallucination representation learning, supporting over 14 languages and exhibiting strong generalization. In the SemEval-2025 Task-3 multilingual evaluation (eight languages, including English), our approach ranks among the top ten systems and demonstrates robust zero-shot performance on unseen languages. The proposed framework provides a scalable, high-accuracy, unified solution for multilingual hallucination detection, significantly advancing the capability to assess cross-lingual reliability of LLMs.

Technology Category

Application Category

📝 Abstract

Hallucinations are one of the major problems of LLMs, hindering their trustworthiness and deployment to wider use cases. However, most of the research on hallucinations focuses on English data, neglecting the multilingual nature of LLMs. This paper describes our submission to the SemEval-2025 Task-3 - Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. We propose a two-part pipeline that combines retrieval-based fact verification against Wikipedia with a BERT-based system fine-tuned to identify common hallucination patterns. Our system achieves competitive results across all languages, reaching top-10 results in eight languages, including English. Moreover, it supports multiple languages beyond the fourteen covered by the shared task. This multilingual hallucination identifier can help to improve LLM outputs and their usefulness in the future.

Problem

Research questions and friction points this paper is trying to address.

Identifying multilingual non-factual hallucinations in LLMs

Addressing lack of non-English hallucination research

Improving LLM trustworthiness via retrieval-based verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual BERT-based hallucination pattern identification

Retrieval-based fact verification using Wikipedia

Supports multiple languages beyond task scope

🔎 Similar Papers

No similar papers found.

Authors to Follow