π€ AI Summary
This work addresses the credibility conflict between parametric memory and retrieved evidence that large language models (LLMs) often encounter when answering knowledge-intensive questions. To resolve this, the authors propose TrustMarginβa training-free, plug-and-play arbitration mechanism that leverages the LLMβs own log-likelihood to compute a parametric prior margin and an evidence-bound margin, dynamically selecting between direct generation and retrieval-augmented generation (RAG) based on relative reliability. TrustMargin is the first method to enable source arbitration without fine-tuning, external verifiers, or additional generation steps. Evaluated on 2WikiMQA and CWQA benchmarks across three LLaMA variants, it consistently outperforms both pure generation and BM25-RAG baselines, substantially narrowing the gap to oracle performance and demonstrating strong generalization across multiple zero-training RAG systems.
π Abstract
Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence, but neither source is uniformly reliable. Retrieval can fill knowledge gaps, yet distracting passages may override correct closed-book answers. We study this post-generation conflict as answer-level source arbitration: given Direct and RAG answers from the same frozen model, decide which source to trust. We propose TRUSTMARGIN, a training-free, plug-and-play arbitration layer that scores the two existing candidates with the model's own likelihoods. It combines a parametric-prior margin, which tests whether memory accepts the retrieved answer, with an evidence-binding margin, which discounts passage-only salience and measures question-specific support. TRUSTMARGIN selects between Direct and RAG without fine-tuning, external judges, or additional generation. Across 2WIKIMQA and CWQA with three LLaMA scales, TRUSTMARGIN consistently improves over Direct generation and BM25-RAG, recovers part of the Direct/RAG oracle gap, and generalizes to multiple training-free RAG pipelines.