Each to Their Own: Exploring the Optimal Embedding in RAG

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

In retrieval-augmented generation (RAG), heterogeneous embedding models exhibit inconsistent performance across domains, leading to biased similarity computation and degraded LLM response quality. Method: This paper proposes Confident RAG—a training-free, plug-and-play multi-embedding fusion framework. Its core innovations are: (1) standardized similarity fusion combined with collaborative multi-embedding retrieval ranking; and (2) a generation-confidence-based dynamic model selection mechanism that adaptively identifies the optimal embedding path across reasoning steps. Contribution/Results: Evaluated on cross-domain benchmarks, Confident RAG achieves average improvements of ~10% over base LLMs and ~5% over standard RAG. It demonstrates strong stability and generalization, effectively mitigating domain dependency inherent in single-embedding models while preserving computational efficiency and deployment simplicity.

Technology Category

Application Category

📝 Abstract

Recently, as Large Language Models (LLMs) have fundamentally impacted various fields, the methods for incorporating up-to-date information into LLMs or adding external knowledge to construct domain-specific models have garnered wide attention. Retrieval-Augmented Generation (RAG), serving as an inference-time scaling method, is notable for its low cost and minimal effort for parameter tuning. However, due to heterogeneous training data and model architecture, the variant embedding models used in RAG exhibit different benefits across various areas, often leading to different similarity calculation results and, consequently, varying response quality from LLMs. To address this problem, we propose and examine two approaches to enhance RAG by combining the benefits of multiple embedding models, named Mixture-Embedding RAG and Confident RAG. Mixture-Embedding RAG simply sorts and selects retrievals from multiple embedding models based on standardized similarity; however, it does not outperform vanilla RAG. In contrast, Confident RAG generates responses multiple times using different embedding models and then selects the responses with the highest confidence level, demonstrating average improvements of approximately 10% and 5% over vanilla LLMs and RAG, respectively. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play approach for various domains. We will release our code upon publication.

Problem

Research questions and friction points this paper is trying to address.

Optimizing embedding models in RAG for diverse domains

Addressing varying similarity results from heterogeneous embeddings

Improving response quality via multi-embedding fusion strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines multiple embedding models benefits

Selects responses with highest confidence level

Plug-and-play approach for various domains

🔎 Similar Papers

Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery