Improved Evidence Extraction for Document Inconsistency Detection with LLMs

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the limited accuracy of evidence extraction in large language models for document inconsistency detection. To overcome the shortcomings of conventional direct prompting, the authors propose the “Red-Delete-Retry” framework coupled with a constraint-based filtering mechanism. A comprehensive evaluation metric is introduced to systematically assess the completeness and reliability of extracted evidence. Experimental results demonstrate that the proposed approach significantly enhances evidence extraction performance, consistently outperforming existing baselines across multiple benchmarks. The method thus provides more robust support for inconsistency detection tasks by improving both the precision and trustworthiness of the retrieved evidence.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. However, research on LLM-based approaches to document inconsistency detection is relatively limited. There are two key aspects of document inconsistency detection: (i) classification of whether there exists any inconsistency, and (ii) providing evidence of the inconsistent sentences. We focus on the latter, and introduce new comprehensive evidence-extraction metrics and a redact-and-retry framework with constrained filtering that substantially improves LLM-based document inconsistency detection over direct prompting. We back our claims with promising experimental results.

Problem

Research questions and friction points this paper is trying to address.

document inconsistency detection

evidence extraction

large language models

inconsistent sentences

Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence extraction

document inconsistency detection

large language models

redact-and-retry framework