Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current large language models (LLMs) struggle to efficiently handle unstructured literature review tasks requiring multi-step reasoning. To address this, we propose “Progressive Document Investigation,” a novel paradigm implemented as an end-to-end system: in the offline phase, web crawling and NLP techniques automatically construct a fact-dimension knowledge graph from massive scholarly corpora; in the online phase, multi-hop graph traversal retrieval is tightly integrated with retrieval-augmented generation (RAG)-enhanced LLMs to enable interactive exploration and interpretable report generation. Our core innovation lies in a two-stage architecture that synergistically combines graph-structured knowledge modeling with LLM reasoning—enabling a fully automated pipeline from raw text to structured knowledge to high-quality, citation-grounded reports. Evaluated on a graph comprising over 50,000 papers and their citation relationships, our method significantly improves review efficiency, report quality, and process traceability, supporting iterative, verifiable academic investigation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently demonstrated remarkable performance in tasks such as Retrieval-Augmented Generation (RAG) and autonomous AI agent workflows. Yet, when faced with large sets of unstructured documents requiring progressive exploration, analysis, and synthesis, such as conducting literature survey, existing approaches often fall short. We address this challenge -- termed Progressive Document Investigation -- by introducing Graphy, an end-to-end platform that automates data modeling, exploration and high-quality report generation in a user-friendly manner. Graphy comprises an offline Scrapper that transforms raw documents into a structured graph of Fact and Dimension nodes, and an online Surveyor that enables iterative exploration and LLM-driven report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario. The demonstration video can be found at https://youtu.be/uM4nzkAdGlM.

Problem

Research questions and friction points this paper is trying to address.

Automating data modeling and exploration

Generating high-quality reports from unstructured data

Facilitating progressive document investigation with Graphy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data modeling platform

Structured graph transformation

LLM-driven report generation

🔎 Similar Papers

A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications