Each to Their Own: Exploring the Optimal Embedding in RAG

πŸ“… 2025-07-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In retrieval-augmented generation (RAG), heterogeneous embedding models exhibit inconsistent performance across domains, leading to biased similarity computation and degraded LLM response quality. Method: This paper proposes Confident RAGβ€”a training-free, plug-and-play multi-embedding fusion framework. Its core innovations are: (1) standardized similarity fusion combined with collaborative multi-embedding retrieval ranking; and (2) a generation-confidence-based dynamic model selection mechanism that adaptively identifies the optimal embedding path across reasoning steps. Contribution/Results: Evaluated on cross-domain benchmarks, Confident RAG achieves average improvements of ~10% over base LLMs and ~5% over standard RAG. It demonstrates strong stability and generalization, effectively mitigating domain dependency inherent in single-embedding models while preserving computational efficiency and deployment simplicity.

Technology Category

Application Category

πŸ“ Abstract
Recently, as Large Language Models (LLMs) have fundamentally impacted various fields, the methods for incorporating up-to-date information into LLMs or adding external knowledge to construct domain-specific models have garnered wide attention. Retrieval-Augmented Generation (RAG), serving as an inference-time scaling method, is notable for its low cost and minimal effort for parameter tuning. However, due to heterogeneous training data and model architecture, the variant embedding models used in RAG exhibit different benefits across various areas, often leading to different similarity calculation results and, consequently, varying response quality from LLMs. To address this problem, we propose and examine two approaches to enhance RAG by combining the benefits of multiple embedding models, named Mixture-Embedding RAG and Confident RAG. Mixture-Embedding RAG simply sorts and selects retrievals from multiple embedding models based on standardized similarity; however, it does not outperform vanilla RAG. In contrast, Confident RAG generates responses multiple times using different embedding models and then selects the responses with the highest confidence level, demonstrating average improvements of approximately 10% and 5% over vanilla LLMs and RAG, respectively. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play approach for various domains. We will release our code upon publication.
Problem

Research questions and friction points this paper is trying to address.

Optimizing embedding models in RAG for diverse domains
Addressing varying similarity results from heterogeneous embeddings
Improving response quality via multi-embedding fusion strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines multiple embedding models benefits
Selects responses with highest confidence level
Plug-and-play approach for various domains
πŸ”Ž Similar Papers
No similar papers found.
S
Shiting Chen
Faculty of Education, University of Hong Kong, Hong Kong, China
Z
Zijian Zhao
Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Jinsong Chen
Jinsong Chen
Central China Normal University
Graph Representation LearningGraph Data MiningAI for Education