ATLANTIS at SemEval-2025 Task 3: Detecting Hallucinated Text Spans in Question Answering

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the detection of hallucinated (factually incorrect) content in large language model (LLM) question-answering outputs. We propose a multilingual hallucination detection method that integrates context-aware few-shot prompting with token-level classification. Our approach introduces a novel synthetic dataset and performs fine-tuning under both context-available and context-absent settings to enable fine-grained localization of hallucinated spans within generated text. Evaluated on SemEval-2025 Task 3, our method achieves state-of-the-art performance: first place in the Spanish subtask and top-tier results in English and German subtasks. These outcomes demonstrate strong cross-lingual generalization capability and practical effectiveness for hallucination detection across diverse languages.

Technology Category

Application Category

📝 Abstract

This paper presents the contributions of the ATLANTIS team to SemEval-2025 Task 3, focusing on detecting hallucinated text spans in question answering systems. Large Language Models (LLMs) have significantly advanced Natural Language Generation (NLG) but remain susceptible to hallucinations, generating incorrect or misleading content. To address this, we explored methods both with and without external context, utilizing few-shot prompting with a LLM, token-level classification or LLM fine-tuned on synthetic data. Notably, our approaches achieved top rankings in Spanish and competitive placements in English and German. This work highlights the importance of integrating relevant context to mitigate hallucinations and demonstrate the potential of fine-tuned models and prompt engineering.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinated text spans in QA systems

Mitigating incorrect content from Large Language Models

Exploring context-based and synthetic data methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized few-shot prompting with LLM

Applied token-level classification techniques

Fine-tuned LLM on synthetic data

🔎 Similar Papers

No similar papers found.

Authors to Follow