🤖 AI Summary
To address inefficiency in Retrieval-Augmented Generation (RAG) caused by redundant retrieval, this paper proposes the Knowledge Boundary Model (KBM), a lightweight, learnable, plug-and-play binary classifier that dynamically triggers retrieval only when queries exceed the large language model’s (LLM’s) intrinsic knowledge boundary. Methodologically, KBM is the first to systematically categorize RAG retrieval impact into three types: beneficial, neutral, and harmful. It is trained on multilingual (Chinese/English), multi-source QA data and designed for cross-lingual and cross-scenario generalization—including dynamic knowledge updates, long-tail facts, and multi-hop reasoning. Evaluated across 11 benchmarks, KBM reduces average retrieval frequency by up to 47% while maintaining or improving question-answering accuracy. Notably, it demonstrates strong robustness in challenging settings such as dynamic knowledge domains, confirming its effectiveness in mitigating unnecessary retrieval without compromising performance.
📝 Abstract
Large Language Models (LLMs) are increasingly recognized for their practical applications. However, these models often encounter challenges in dynamically changing knowledge, as well as in managing unknown static knowledge. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. Actually, we find that the impact of RAG on the question answering capabilities of LLMs can be categorized into three groups: beneficial, neutral, and harmful. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs, while also improving the overall performance of LLMs. This insight motivates us to differentiate between types of questions using certain metrics as indicators, to decrease the retrieval ratio without compromising performance. In our work, we propose a method that is able to identify different types of questions from this view by training a Knowledge Boundary Model (KBM). Experiments conducted on 11 English and Chinese datasets illustrate that the KBM effectively delineates the knowledge boundary, significantly decreasing the proportion of retrievals required for optimal end-to-end performance. Specifically, we evaluate the effectiveness of KBM in three complex scenarios: dynamic knowledge, long-tail static knowledge, and multi-hop problems, as well as its functionality as an external LLM plug-in.