GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

📅 2024-03-07

🏛️ arXiv.org

📈 Citations: 44

✨ Influential: 6

🤖 AI Summary

Large language models (LLMs) exhibit limited capability in graph understanding and multi-step reasoning. Method: This paper introduces GraphInstruct—the first comprehensive benchmark for evaluating and enhancing LLMs’ graph reasoning abilities—covering 21 classical graph tasks. We propose GraphLM, an instruction-tuned model, and GraphLM+, which integrates a novel Step Mask Training strategy to enable end-to-end graph-structured reasoning without graph neural network (GNN) components. To improve generalization, we employ multi-stage graph data generation and structured prompt engineering. Contribution/Results: Experiments demonstrate that GraphLM and GraphLM+ significantly outperform state-of-the-art LLMs across all 21 tasks. All data generation code is open-sourced to advance interdisciplinary research at the intersection of graph theory and AI.

Technology Category

Application Category

📝 Abstract

Evaluating and enhancing the general capabilities of large language models (LLMs) has been an important research topic. Graph is a common data structure in the real world, and understanding graph data is a crucial part for advancing general intelligence. To evaluate and enhance the graph understanding abilities of LLMs, in this paper, we propose a benchmark named GraphInstruct, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed reasoning steps. Based on GraphInstruct, we further construct GraphLM through efficient instruction-tuning, which shows prominent graph understanding capability. In order to enhance the LLM with graph reasoning capability as well, we propose a step mask training strategy, and construct a model named GraphLM+. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphLM and GraphLM+ over other LLMs. We look forward to more researchers exploring the potential of LLMs in the graph data mining domain through GraphInstruct. Our code for generating GraphInstruct is released publicly at: https://github.com/CGCL-codes/GraphInstruct.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' ability to understand graph-structured data

Developing models for multi-step reasoning on graph tasks

Creating benchmark and training methods for graph reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic benchmark with 21 graph reasoning tasks

Instruction-tuning method for graph understanding capability

Label-mask training for multi-step graph reasoning

🔎 Similar Papers

No similar papers found.

Authors to Follow