GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

📅 2024-03-07
🏛️ arXiv.org
📈 Citations: 44
Influential: 6
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited capability in graph understanding and multi-step reasoning. Method: This paper introduces GraphInstruct—the first comprehensive benchmark for evaluating and enhancing LLMs’ graph reasoning abilities—covering 21 classical graph tasks. We propose GraphLM, an instruction-tuned model, and GraphLM+, which integrates a novel Step Mask Training strategy to enable end-to-end graph-structured reasoning without graph neural network (GNN) components. To improve generalization, we employ multi-stage graph data generation and structured prompt engineering. Contribution/Results: Experiments demonstrate that GraphLM and GraphLM+ significantly outperform state-of-the-art LLMs across all 21 tasks. All data generation code is open-sourced to advance interdisciplinary research at the intersection of graph theory and AI.

Technology Category

Application Category

📝 Abstract
Evaluating and enhancing the general capabilities of large language models (LLMs) has been an important research topic. Graph is a common data structure in the real world, and understanding graph data is a crucial part for advancing general intelligence. To evaluate and enhance the graph understanding abilities of LLMs, in this paper, we propose a benchmark named GraphInstruct, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed reasoning steps. Based on GraphInstruct, we further construct GraphLM through efficient instruction-tuning, which shows prominent graph understanding capability. In order to enhance the LLM with graph reasoning capability as well, we propose a step mask training strategy, and construct a model named GraphLM+. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphLM and GraphLM+ over other LLMs. We look forward to more researchers exploring the potential of LLMs in the graph data mining domain through GraphInstruct. Our code for generating GraphInstruct is released publicly at: https://github.com/CGCL-codes/GraphInstruct.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' ability to understand graph-structured data
Developing models for multi-step reasoning on graph tasks
Creating benchmark and training methods for graph reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic benchmark with 21 graph reasoning tasks
Instruction-tuning method for graph understanding capability
Label-mask training for multi-step graph reasoning
🔎 Similar Papers
No similar papers found.
Zihan Luo
Zihan Luo
PhD Student of Data Science, HKUST
Route Planning
X
Xiran Song
Huazhong University of Science and Technology, Wuhan, China
H
Hong Huang
Huazhong University of Science and Technology, Wuhan, China
Jianxun Lian
Jianxun Lian
Microsoft Research Asia
LLM AgentAnthropomorphic IntelligenceUser ModelingRecommendation System
C
Chenhao Zhang
Huazhong University of Science and Technology, Wuhan, China
J
Jinqi Jiang
Huazhong University of Science and Technology, Wuhan, China
X
Xing Xie
Microsoft Research Asia, Beijing, China
Hai Jin
Hai Jin
Huazhong University of Science and Technology
Parallel and Distributed ComputingComputer ArchitectureCloud ComputingP2P