TUM-MiKaNi at SemEval-2025 Task 3: Towards Multilingual and Knowledge-Aware Non-factual Hallucination Identification

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of detecting non-factual hallucinations in large language models (LLMs) across multilingual settings, this paper proposes a knowledge-enhanced two-stage detection method. The first stage performs cross-lingual fact verification via Wikipedia-based retrieval; the second stage fine-tunes a multilingual BERT model to learn language-agnostic hallucination patterns. This work is the first to deeply integrate knowledge-aware factual validation with multilingual hallucination representation learning, supporting over 14 languages and exhibiting strong generalization. In the SemEval-2025 Task-3 multilingual evaluation (eight languages, including English), our approach ranks among the top ten systems and demonstrates robust zero-shot performance on unseen languages. The proposed framework provides a scalable, high-accuracy, unified solution for multilingual hallucination detection, significantly advancing the capability to assess cross-lingual reliability of LLMs.

Technology Category

Application Category

📝 Abstract
Hallucinations are one of the major problems of LLMs, hindering their trustworthiness and deployment to wider use cases. However, most of the research on hallucinations focuses on English data, neglecting the multilingual nature of LLMs. This paper describes our submission to the SemEval-2025 Task-3 - Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. We propose a two-part pipeline that combines retrieval-based fact verification against Wikipedia with a BERT-based system fine-tuned to identify common hallucination patterns. Our system achieves competitive results across all languages, reaching top-10 results in eight languages, including English. Moreover, it supports multiple languages beyond the fourteen covered by the shared task. This multilingual hallucination identifier can help to improve LLM outputs and their usefulness in the future.
Problem

Research questions and friction points this paper is trying to address.

Identifying multilingual non-factual hallucinations in LLMs
Addressing lack of non-English hallucination research
Improving LLM trustworthiness via retrieval-based verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual BERT-based hallucination pattern identification
Retrieval-based fact verification using Wikipedia
Supports multiple languages beyond task scope
🔎 Similar Papers
No similar papers found.
Miriam Anschütz
Miriam Anschütz
PhD Student of Computer Science, Technical University of Munich
Natural language processingeasy-to-readtext simplification
E
Ekaterina Gikalo
School of Computation, Information and Technology, Technical University of Munich
N
Niklas Herbster
School of Computation, Information and Technology, Technical University of Munich
Georg Groh
Georg Groh
Adjunct Professor
Social ComputingNatural Language Processing