Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
Traditional retrieval-augmented generation (RAG) systems struggle to support complex reasoning due to intent-agnostic retrieval and fragmented information. This work proposes InSemRAG, a novel framework that enhances knowledge coverage through intent-aware retrieval (IAR) and semantics-preserving chunking (SPC), while dynamically restoring semantic completeness of evidence via an iterative retrieve-and-verify mechanism. By integrating small language models (SLMs) to accelerate inference, the approach achieves efficient, low-latency generation. Experimental results demonstrate that InSemRAG improves F1 by 2.65 points on HotPotQA and boosts accuracy by 1.5 points on FEVER, while reducing latency by a factor of 4.32 compared to Multi-Hop RAG.
📝 Abstract
The demand for powerful instruction following and reasoning capability of large language models (LLMs) has promoted rapid development of retrieval-augmented generation (RAG). The RAG system assists LLM generation by retrieving chunks of query-fit supplementary knowledge from an external database. Conventional RAG systems, however, suffer from information insufficiency due to two factors, which are intent-agnostic retrieval and information fragmentation. Our work proposes a RAG framework, termed InSemRAG, that addresses these challenges via an iterative retrieve-and-check mechanism with two supporting modules, an intention-aware retriever (IAR) and semantics-preserving chunking (SPC). IAR implements a dynamic hybrid retrieval method that adaptively weights the retrieval channels based on the query intent, while SPC performs detection and reparation to the damaged evidence chunks to preserve the semantic integrity. To alleviate the computational latency brought by our iterative mechanism, we leverage small language models (SLMs). Extensive experiments across several benchmark datasets consistently demonstrate the competitiveness of our method against recent state-of-the-art RAG mechanisms. Particularly, our method achieves significant gains on multi-hop and evidence-sensitive tasks, with a 2.65-point improvement in F1 on HotPotQA and a 1.5-point increase in accuracy on FEVER. Our method also achieves competitive performance to Multi-Hop RAG with 4.32$\times$ lower latency with the utilization of SLM.
Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented generation
intent-agnostic retrieval
information fragmentation
semantic integrity
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intent-Aware Retrieval
Semantics-Preserving Chunking
Retrieval-Augmented Generation
Iterative Retrieve-and-Check
Small Language Models
F
Fachrina Dewi Puspitasari
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Chaoning Zhang
Chaoning Zhang
Professor at UESTC (电子科技大学, China)
Computer VisionLLM and VLMGenAI and AIGC Detection
J
Jiaquan Zhang
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Z
Zhicheng Wang
School of Computer Science and Engineering, University of Electronic Science and Technology of China
H
Hafiz Shakeel Ahmad Awan
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Rizwan Qureshi
Rizwan Qureshi
Center for Research in Computer Vision (CRCV), University of Central Florida, Orlando, USA
Cancer Data ScienceResponsible AIComputer VisionBioinformaticsMachine Learning
Jewon Lee
Jewon Lee
Nota Inc.
AI
Tae-Ho Kim
Tae-Ho Kim
Nota Inc.
Y
Yang Yang
School of Computer Science and Engineering, University of Electronic Science and Technology of China