ATLANTIS at SemEval-2025 Task 3: Detecting Hallucinated Text Spans in Question Answering

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the detection of hallucinated (factually incorrect) content in large language model (LLM) question-answering outputs. We propose a multilingual hallucination detection method that integrates context-aware few-shot prompting with token-level classification. Our approach introduces a novel synthetic dataset and performs fine-tuning under both context-available and context-absent settings to enable fine-grained localization of hallucinated spans within generated text. Evaluated on SemEval-2025 Task 3, our method achieves state-of-the-art performance: first place in the Spanish subtask and top-tier results in English and German subtasks. These outcomes demonstrate strong cross-lingual generalization capability and practical effectiveness for hallucination detection across diverse languages.

Technology Category

Application Category

📝 Abstract
This paper presents the contributions of the ATLANTIS team to SemEval-2025 Task 3, focusing on detecting hallucinated text spans in question answering systems. Large Language Models (LLMs) have significantly advanced Natural Language Generation (NLG) but remain susceptible to hallucinations, generating incorrect or misleading content. To address this, we explored methods both with and without external context, utilizing few-shot prompting with a LLM, token-level classification or LLM fine-tuned on synthetic data. Notably, our approaches achieved top rankings in Spanish and competitive placements in English and German. This work highlights the importance of integrating relevant context to mitigate hallucinations and demonstrate the potential of fine-tuned models and prompt engineering.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinated text spans in QA systems
Mitigating incorrect content from Large Language Models
Exploring context-based and synthetic data methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized few-shot prompting with LLM
Applied token-level classification techniques
Fine-tuned LLM on synthetic data
🔎 Similar Papers
No similar papers found.
Catherine Kobus
Catherine Kobus
Research engineer, Airbus
Machine LearningDeep LearningMachine TranslationNatural Language ProcessingSpoken Language Processing
F
François Lancelot
Airbus AI Research
M
Marion-Cécile Martin
Airbus AI Research
N
Nawal Ould Amer
Airbus AI Research