Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates how users leverage web search to verify the factual accuracy of large language model (LLM) outputs and mitigate hallucination risks. A randomized controlled experiment (N=560) compared static versus dynamic search interfaces in terms of users’ hallucination detection accuracy and confidence, incorporating the Need for Cognition (NFC) scale and a three-tier content annotation scheme (true / mild / severe hallucination). Results reveal that dynamic search significantly improves users’ accuracy in identifying true statements and enhances overall confidence; both static and dynamic search reduce perceived hallucination severity; and high-NFC users demonstrate greater sensitivity to severe hallucinations. This work provides the first empirical evidence of key cognitive mechanisms underlying human-AI collaborative verification, offering foundational insights for designing hallucination-resilient interactive interfaces grounded in both empirical data and cognitive theory.

Technology Category

Application Category

📝 Abstract

While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or 'hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby avoiding falling victim to hallucinations. This study (N = 560) investigated how the provision of search results, either static (fixed search results) or dynamic (participant-driven searches), affect participants' perceived accuracy and confidence in evaluating LLM-generated content (i.e., genuine, minor hallucination, major hallucination), compared to the control condition (no search results). Findings indicate that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall confidence in their assessments than those in the static or control conditions. In addition, those higher in need for cognition (NFC) rated major hallucinations to be less accurate than low NFC participants, with no corresponding difference for genuine content or minor hallucinations. These results underscore the potential benefits of integrating web search results into LLMs for the detection of hallucinations, as well as the need for a more nuanced approach when developing human-centered systems, taking user characteristics into account.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in LLM-generated content using web search results

Comparing static vs dynamic search results for accuracy perception

Investigating user traits like NFC in hallucination detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates static and dynamic web search results

Compares hallucination detection accuracy conditions

Considers user cognition traits in evaluations

🔎 Similar Papers

No similar papers found.

Authors to Follow