🤖 AI Summary
Traditional RAG systems struggle to adapt to highly heterogeneous query distributions due to their reliance on a single retriever and fixed hyperparameters. This work proposes constructing an ensemble of diverse retrievers and formalizes the selection problem as an expected optimal k-of-1 objective, for which an efficient construction algorithm with provable near-optimality guarantees is developed. By modeling the query distribution, optimizing retriever diversity, learning a routing policy, and enabling parallel retrieval and generation, the proposed approach significantly outperforms both single-retriever and naive multi-retriever baselines across multiple question-answering benchmarks. It simultaneously improves answer quality and retrieval metrics while effectively reducing latency and token consumption.
📝 Abstract
Retrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.