CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing hallucination research focuses narrowly on either cross-lingual or cross-modal dimensions in isolation, lacking systematic investigation of their joint occurrence. Method: We introduce CCHall, the first benchmark for joint cross-lingual and cross-modal hallucination detection—explicitly defining and evaluating hallucinations in large language models (LLMs) under multilingual-multimodal mixed inputs. Built upon adversarial test sets derived from multilingual text–image pairs, CCHall integrates human verification with automated metrics to establish a rigorous evaluation protocol. Results: Experiments across leading open- and closed-source LLMs reveal significantly elevated hallucination rates and severely limited generalization in this joint setting. CCHall bridges a critical gap in multidimensional hallucination assessment, providing both a foundational benchmark and an analytical framework to enhance the robustness of multilingual multimodal models.

Technology Category

Application Category

📝 Abstract
Investigating hallucination issues in large language models (LLMs) within cross-lingual and cross-modal scenarios can greatly advance the large-scale deployment in real-world applications. Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. Motivated by this, we introduce a novel joint Cross-lingual and Cross-modal Hallucinations benchmark (CCHall) to fill this gap. Specifically, CCHall simultaneously incorporates both cross-lingual and cross-modal hallucination scenarios, which can be used to assess the cross-lingual and cross-modal capabilities of LLMs. Furthermore, we conduct a comprehensive evaluation on CCHall, exploring both mainstream open-source and closed-source LLMs. The experimental results highlight that current LLMs still struggle with CCHall. We hope CCHall can serve as a valuable resource to assess LLMs in joint cross-lingual and cross-modal scenarios.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in joint cross-lingual and cross-modal LLMs
Assessing LLMs' capabilities in cross-lingual and cross-modal scenarios
Evaluating mainstream LLMs on a novel benchmark (CCHall)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint cross-lingual and cross-modal benchmark
Comprehensive evaluation of LLMs
Detects hallucinations in multilingual and multimodal scenarios
🔎 Similar Papers
No similar papers found.
Yongheng Zhang
Yongheng Zhang
M.S. Student @ CSU | Research Intern @ Tencent
Artificial IntelligenceLarge Language ModelWorld Model
X
Xu Liu
School of Computer Science and Engineering, Central South University, China
R
Ruoxi Zhou
School of Computer Science and Engineering, Central South University, China
Qiguang Chen
Qiguang Chen
Harbin Institute of Technology
Chain-of-ThoughtReasoningMultilingual LLMMulti-modal LLM
Hao Fei
Hao Fei
National University of Singapore
Vision and LanguageLarge Language ModelNatural Language ProcessingWorld Modeling
W
Wenpeng Lu
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), China
L
Libo Qin
School of Computer Science and Engineering, Central South University, China; Key Laboratory of Data Intelligence and Advanced Computing in Provincial Universities, Soochow University, China