🤖 AI Summary
Traditional RAG systems treat text chunks as atomic units, limiting their capacity for multi-hop question answering. While GraphRAG incorporates knowledge graphs to model relational structures, it suffers from exponential computational complexity and reliance on heuristic retrieval strategies. This work proposes UnWeaver, a novel framework that eschews explicit graph construction by leveraging large language models to decompose documents into entities spanning multiple text chunks. UnWeaver introduces an entity-to-chunk mapping mechanism that enables entity-mediated reconstruction of original content during retrieval. By using entities as intermediaries, the approach preserves high fidelity to source materials while effectively supporting multi-hop reasoning. The method significantly reduces system complexity and noise, achieving performance comparable to GraphRAG with a substantially simpler and more efficient architecture.
📝 Abstract
One of the key problems in Retrieval-augmented generation (RAG) systems is that chunk-based retrieval pipelines represent the source chunks as atomic objects, mixing the information contained within such a chunk into a single vector. These vector representations are then fundamentally treated as isolated, independent and self-sufficient, with no attempt to represent possible relations between them. Such an approach has no dedicated mechanisms for handling multi-hop questions. Graph-based RAG systems aimed to ameliorate this problem by modeling information as knowledge-graphs, with entities represented by nodes being connected by robust relations, and forming hierarchical communities. This approach however suffers from its own issues with some of them being: orders of magnitude increased componential complexity in order to create graph-based indices, and reliance on heuristics for performing retrieval. We propose UnWeaver, a novel RAG framework simplifying the idea of GraphRAG. UnWeaver disentangles the contents of the documents into entities which can occur across multiple chunks using an LLM. In the retrieval process entities are used as an intermediate way of recovering original text chunks hence preserving fidelity to the source material. We argue that entity-based decomposition yields a more distilled representation of original information, and additionally serves to reduce noise in the indexing, and generation process.