🤖 AI Summary
This work addresses the challenge of factual inaccuracies frequently generated by large language models in knowledge-intensive tasks, where long-form outputs often contain errors that demand efficient post-hoc correction. To this end, we propose FactCorrector, a training-free, domain-adaptable post-processing framework that introduces, for the first time, a graph-inspired mechanism to leverage structured factual feedback for precise error correction in original model responses. We also construct VELI5, a new evaluation benchmark that systematically injects factual errors and provides corresponding ground-truth corrections. Experimental results demonstrate that FactCorrector significantly improves factual accuracy across VELI5 and multiple established long-form factuality datasets, while preserving response relevance and outperforming strong existing baselines.
📝 Abstract
Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses. A promising approach to rectify these flaws is correcting LLMs using feedback. Therefore, in this paper, we introduce FactCorrector, a new post-hoc correction method that adapts across domains without retraining and leverages structured feedback about the factuality of the original response to generate a correction. To support rigorous evaluations of factuality correction methods, we also develop the VELI5 benchmark, a novel dataset containing systematically injected factual errors and ground-truth corrections. Experiments on VELI5 and several popular long-form factuality datasets show that the FactCorrector approach significantly improves factual precision while preserving relevance, outperforming strong baselines. We release our code at https://ibm.biz/factcorrector.