🤖 AI Summary
Existing RAG frameworks suffer from concurrent redundancy and insufficiency in external knowledge retrieval: static strategies often cause over-retrieval or reasoning failure, while current adaptive methods rely solely on query complexity estimation and lack user controllability. This paper proposes a user-tunable dynamic accuracy–cost trade-off framework. It introduces a novel cooperative decision-making mechanism based on dual classifiers and an interpretable control parameter α, enabling on-demand switching between high-accuracy and low-overhead retrieval modes. By dynamically routing retrieval strategies and optimizing the RAG pipeline, our approach achieves Pareto-optimal balance between accuracy and retrieval cost across multiple benchmarks. Users can explicitly adjust α to customize the performance–efficiency trade-off, significantly enhancing deployment flexibility and human–AI collaboration capability.
📝 Abstract
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model (LLM) hallucinations by incorporating external knowledge retrieval. However, existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving when unnecessary or failing to retrieve iteratively when required for complex reasoning. Recent adaptive retrieval strategies, though adaptively navigates these retrieval strategies, predict only based on query complexity and lacks user-driven flexibility, making them infeasible for diverse user application needs. In this paper, we introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off. Our approach leverages two classifiers: one trained to prioritize accuracy and another to prioritize retrieval efficiency. Via an interpretable control parameter $alpha$, users can seamlessly navigate between minimal-cost retrieval and high-accuracy retrieval based on their specific requirements. We empirically demonstrate that our approach effectively balances accuracy, retrieval cost, and user controllability, making it a practical and adaptable solution for real-world applications.