🤖 AI Summary
This work addresses the challenge of balancing retrieval efficiency and accuracy in traditional Retrieval-Augmented Generation (RAG) systems when operating over large-scale databases. To this end, the authors propose a progressive hierarchical retrieval algorithm that refines the candidate document set through a multi-stage search process, advancing from low-dimensional to high-dimensional embeddings. The method innovatively incorporates a dimensionality-increasing progressive strategy, integrating hierarchical embedding representations with similarity-driven candidate filtering. This approach achieves substantial gains in retrieval speed while preserving high accuracy. Experimental results demonstrate that the proposed framework significantly enhances both the scalability and overall performance of RAG systems.
📝 Abstract
Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores documents as embedding vectors in a database. Given a query, search is executed to find the most related documents. Then, the topmost matching documents are inserted into LLMs'prompt to generate a response. Efficient and accurate searching is critical for RAG to get relevant information. We propose a cost-effective searching algorithm for retrieval process. Our progressive searching algorithm incrementally refines the candidate set through a hierarchy of searches, starting from low-dimensional embeddings and progressing into a higher, target-dimensionality. This multi-stage approach reduces retrieval time while preserving the desired accuracy. Our findings demonstrate that progressive search in RAG systems achieves a balance between dimensionality, speed, and accuracy, enabling scalable and high-performance retrieval even for large databases.