🤖 AI Summary
This study addresses the challenge of hallucinations in large language models (LLMs) during scientific literature analysis, which can compromise the accuracy of information extraction. To mitigate this issue, the authors propose Peer Context Outlier Detection (P-COD), a novel approach that introduces peer context consistency into hallucination detection. Grounded in the assumption that conclusions drawn under similar experimental settings should be consistent, P-COD leverages inter-document relationships to validate cross-paper coherence. By integrating LLMs with corpus-level relational structures, the method dynamically computes confidence scores and flags low-confidence outputs for expert review. Evaluated across six scientific domains, P-COD achieves 98% precision in outlier detection, substantially reducing hallucination rates and enhancing both the reliability and efficiency of automated scientific analysis.
📝 Abstract
Reducing hallucinations in Large Language Models (LLMs) is essential for improving the accuracy of data extraction from large text corpora. Current methods, like prompt engineering and chain-of-thought prompting, focus on individual documents but fail to consider relationships across a corpus. This paper introduces Peer Context Outlier Detection (P-COD), a novel approach that uses the relationships between documents to improve extraction accuracy. Our application domain is in scientific literature summarization, where papers with similar experiment settings should draw similar conclusions. By comparing extracted data to validated peer information within the corpus, we adjust confidence scores and flag low-confidence results for expert review. High-confidence results, supported by peer validation, are considered reliable. Our experiments demonstrate up to 98% precision in outlier detection across 6 domains of science, demonstrating that our design reduces hallucinations, enhances trust in automated systems, and allows researchers to focus on ambiguous cases, streamlining the data extraction workflows.