🤖 AI Summary
This work addresses the challenge in long-document retrieval-augmented generation (RAG) where the granularity of retrieval units struggles to balance contextual completeness and retrieval accuracy: coarse-grained chunks introduce noise, while fine-grained ones suffer from low recall. To resolve this, the authors propose UMG-RAG, a training-free framework that adaptively fuses multi-granularity retrieval results based on the input query. The method converts dense and sparse retrieval scores into evidence distributions, quantifies the uncertainty of each granularity via distribution entropy, and dynamically weights semantic, lexical, and granularity-specific confidence signals. Additionally, a parent-chunk promotion strategy enhances local coherence. Without modifying the underlying retriever or generator, UMG-RAG offers a lightweight, plug-and-play solution that significantly improves answer generation quality.
📝 Abstract
Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worsen long context utilization. Fine-grained units are more compact, but they may be difficult to retrieve reliably because short chunks can lack semantic, lexical, or bridging cues needed to match the query. We propose Uncertainty-aware Multi-Granularity RAG (UMG-RAG), a training-free hybrid retrieval framework that treats chunk granularity as query-specific reliability estimation. Instead of training a new retriever or modifying the generator, UMG-RAG uses existing dense and sparse retrievers as complementary experts across multiple chunk granularities. For each query, it converts each expert-granularity score list into an evidence distribution, estimates reliability from distribution entropy, and fuses candidates according to query-specific semantic, lexical, and granularity confidence. We further introduce UMGP-RAG, a parent promotion variant that uses fine-grained hits to locate relevant evidence while returning broader non-redundant parent chunks for local coherence. Experiments on question answering benchmarks show that uncertainty-aware fusion and parent promotion improve generation quality while maintaining a lightweight, plug-and-play retrieval pipeline.