Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations

πŸ“… 2025-04-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates how users leverage web search to verify the factual accuracy of large language model (LLM) outputs and mitigate hallucination risks. A randomized controlled experiment (N=560) compared static versus dynamic search interfaces in terms of users’ hallucination detection accuracy and confidence, incorporating the Need for Cognition (NFC) scale and a three-tier content annotation scheme (true / mild / severe hallucination). Results reveal that dynamic search significantly improves users’ accuracy in identifying true statements and enhances overall confidence; both static and dynamic search reduce perceived hallucination severity; and high-NFC users demonstrate greater sensitivity to severe hallucinations. This work provides the first empirical evidence of key cognitive mechanisms underlying human-AI collaborative verification, offering foundational insights for designing hallucination-resilient interactive interfaces grounded in both empirical data and cognitive theory.

Technology Category

Application Category

πŸ“ Abstract
While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or 'hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby avoiding falling victim to hallucinations. This study (N = 560) investigated how the provision of search results, either static (fixed search results) or dynamic (participant-driven searches), affect participants' perceived accuracy and confidence in evaluating LLM-generated content (i.e., genuine, minor hallucination, major hallucination), compared to the control condition (no search results). Findings indicate that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall confidence in their assessments than those in the static or control conditions. In addition, those higher in need for cognition (NFC) rated major hallucinations to be less accurate than low NFC participants, with no corresponding difference for genuine content or minor hallucinations. These results underscore the potential benefits of integrating web search results into LLMs for the detection of hallucinations, as well as the need for a more nuanced approach when developing human-centered systems, taking user characteristics into account.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in LLM-generated content using web search results
Comparing static vs dynamic search results for accuracy perception
Investigating user traits like NFC in hallucination detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates static and dynamic web search results
Compares hallucination detection accuracy conditions
Considers user cognition traits in evaluations
πŸ”Ž Similar Papers
No similar papers found.
M
Mahjabin Nahar
The Pennsylvania State University, University Park, PA, USA
Eun-Ju Lee
Eun-Ju Lee
Seoul National University
computer-mediated communicationsocial cognitionsocial influence
J
Jin Won Park
Department of Communication, Seoul National University, Seoul, South Korea
D
Dongwon Lee
The Pennsylvania State University, University Park, PA, USA