🤖 AI Summary
This study addresses fact-checking of numerical and temporal claims by proposing a verification framework that integrates retrieved evidence with the LLaMA large language model. Methodologically, it employs multi-granularity evidence retrieval using BM25 and MiniLM, followed by instruction tuning and parameter-efficient fine-tuning via LoRA; it systematically compares zero-shot prompting against fine-tuning strategies. The core contributions are twofold: (1) empirical identification of evidence granularity as a critical factor governing model generalization, and (2) characterization of the synergistic relationship between evidence selection quality and model adaptability. Experiments demonstrate substantial performance gains over baselines on the English validation set; however, degradation on the test set reveals two key challenges—insufficient evidence utilization and limited cross-scenario generalization—thereby providing both empirical grounding and concrete directions for advancing trustworthy fact-checking research.
📝 Abstract
This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.