π€ AI Summary
In retrieval-augmented generation (RAG), heterogeneous embedding models exhibit inconsistent performance across domains, leading to biased similarity computation and degraded LLM response quality. Method: This paper proposes Confident RAGβa training-free, plug-and-play multi-embedding fusion framework. Its core innovations are: (1) standardized similarity fusion combined with collaborative multi-embedding retrieval ranking; and (2) a generation-confidence-based dynamic model selection mechanism that adaptively identifies the optimal embedding path across reasoning steps. Contribution/Results: Evaluated on cross-domain benchmarks, Confident RAG achieves average improvements of ~10% over base LLMs and ~5% over standard RAG. It demonstrates strong stability and generalization, effectively mitigating domain dependency inherent in single-embedding models while preserving computational efficiency and deployment simplicity.
π Abstract
Recently, as Large Language Models (LLMs) have fundamentally impacted various fields, the methods for incorporating up-to-date information into LLMs or adding external knowledge to construct domain-specific models have garnered wide attention. Retrieval-Augmented Generation (RAG), serving as an inference-time scaling method, is notable for its low cost and minimal effort for parameter tuning. However, due to heterogeneous training data and model architecture, the variant embedding models used in RAG exhibit different benefits across various areas, often leading to different similarity calculation results and, consequently, varying response quality from LLMs. To address this problem, we propose and examine two approaches to enhance RAG by combining the benefits of multiple embedding models, named Mixture-Embedding RAG and Confident RAG. Mixture-Embedding RAG simply sorts and selects retrievals from multiple embedding models based on standardized similarity; however, it does not outperform vanilla RAG. In contrast, Confident RAG generates responses multiple times using different embedding models and then selects the responses with the highest confidence level, demonstrating average improvements of approximately 10% and 5% over vanilla LLMs and RAG, respectively. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play approach for various domains. We will release our code upon publication.