FlashCheck: Exploration of Efficient Evidence Retrieval for Fast Fact-Checking

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor scalability and limited real-time performance of evidence retrieval in automated fact-checking, this paper proposes a lightweight fact indexing and vector quantization (VQ)-based compression co-optimization framework. It is the first work to systematically investigate joint compression of concise fact indexing and dense retrieval over large-scale knowledge sources (e.g., Wikipedia), integrating fact extraction, inverted indexing, and VQ techniques to significantly reduce both storage and computational overhead. Evaluated on HoVer and WiCE benchmarks and real-world 2024 U.S. presidential debate data, the method achieves 10.0× speedup on CPU and over 20.0× on GPU, while maintaining competitive accuracy. Additionally, we release the first publicly available fact-checking dataset specifically designed for debate scenarios. This work bridges a critical gap in research on efficient, deployable evidence retrieval methods for real-time fact-checking systems.

Technology Category

Application Category

📝 Abstract
The advances in digital tools have led to the rampant spread of misinformation. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. It is essential for automated fact-checking to be efficient for aiding in combating misinformation in real-time and at the source. Fact-checking pipelines primarily comprise a knowledge retrieval component which extracts relevant knowledge to fact-check a claim from large knowledge sources like Wikipedia and a verification component. The existing works primarily focus on the fact-verification part rather than evidence retrieval from large data collections, which often face scalability issues for practical applications such as live fact-checking. In this study, we address this gap by exploring various methods for indexing a succinct set of factual statements from large collections like Wikipedia to enhance the retrieval phase of the fact-checking pipeline. We also explore the impact of vector quantization to further improve the efficiency of pipelines that employ dense retrieval approaches for first-stage retrieval. We study the efficiency and effectiveness of the approaches on fact-checking datasets such as HoVer and WiCE, leveraging Wikipedia as the knowledge source. We also evaluate the real-world utility of the efficient retrieval approaches by fact-checking 2024 presidential debate and also open source the collection of claims with corresponding labels identified in the debate. Through a combination of indexed facts together with Dense retrieval and Index compression, we achieve up to a 10.0x speedup on CPUs and more than a 20.0x speedup on GPUs compared to the classical fact-checking pipelines over large collections.
Problem

Research questions and friction points this paper is trying to address.

Enhance evidence retrieval speed
Improve scalability of fact-checking
Optimize indexing for large datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Indexed facts for efficient retrieval
Dense retrieval with vector quantization
Index compression for speedup
🔎 Similar Papers
No similar papers found.