🤖 AI Summary
Current large language models (LLMs) struggle to efficiently handle unstructured literature review tasks requiring multi-step reasoning. To address this, we propose “Progressive Document Investigation,” a novel paradigm implemented as an end-to-end system: in the offline phase, web crawling and NLP techniques automatically construct a fact-dimension knowledge graph from massive scholarly corpora; in the online phase, multi-hop graph traversal retrieval is tightly integrated with retrieval-augmented generation (RAG)-enhanced LLMs to enable interactive exploration and interpretable report generation. Our core innovation lies in a two-stage architecture that synergistically combines graph-structured knowledge modeling with LLM reasoning—enabling a fully automated pipeline from raw text to structured knowledge to high-quality, citation-grounded reports. Evaluated on a graph comprising over 50,000 papers and their citation relationships, our method significantly improves review efficiency, report quality, and process traceability, supporting iterative, verifiable academic investigation.
📝 Abstract
Large Language Models (LLMs) have recently demonstrated remarkable performance in tasks such as Retrieval-Augmented Generation (RAG) and autonomous AI agent workflows. Yet, when faced with large sets of unstructured documents requiring progressive exploration, analysis, and synthesis, such as conducting literature survey, existing approaches often fall short. We address this challenge -- termed Progressive Document Investigation -- by introducing Graphy, an end-to-end platform that automates data modeling, exploration and high-quality report generation in a user-friendly manner. Graphy comprises an offline Scrapper that transforms raw documents into a structured graph of Fact and Dimension nodes, and an online Surveyor that enables iterative exploration and LLM-driven report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario. The demonstration video can be found at https://youtu.be/uM4nzkAdGlM.