π€ AI Summary
This work addresses the limited accuracy of evidence extraction in large language models for document inconsistency detection. To overcome the shortcomings of conventional direct prompting, the authors propose the βRed-Delete-Retryβ framework coupled with a constraint-based filtering mechanism. A comprehensive evaluation metric is introduced to systematically assess the completeness and reliability of extracted evidence. Experimental results demonstrate that the proposed approach significantly enhances evidence extraction performance, consistently outperforming existing baselines across multiple benchmarks. The method thus provides more robust support for inconsistency detection tasks by improving both the precision and trustworthiness of the retrieved evidence.
π Abstract
Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. However, research on LLM-based approaches to document inconsistency detection is relatively limited. There are two key aspects of document inconsistency detection: (i) classification of whether there exists any inconsistency, and (ii) providing evidence of the inconsistent sentences. We focus on the latter, and introduce new comprehensive evidence-extraction metrics and a redact-and-retry framework with constrained filtering that substantially improves LLM-based document inconsistency detection over direct prompting. We back our claims with promising experimental results.