Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

📅 2024-08-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work systematically evaluates the practical capabilities of large language models (LLMs) on graph reasoning tasks—specifically graph description understanding, connectivity judgment, and shortest path finding—revealing a substantial gap between their theoretical expressivity and empirical performance. We introduce a controlled benchmark, structured input representations, and a human-annotated evaluation protocol to conduct rigorous empirical analysis on both synthetic graph reasoning tasks and real-world knowledge graphs. Our study uncovers a previously unreported structural failure mode: LLMs consistently fail to reconstruct accurate graph topologies from natural-language descriptions, exhibiting severe, asymmetric error patterns. Crucially, we bridge the theory–practice divide by quantifying these limitations and providing actionable insights for model improvement. All code, datasets, and evaluation protocols are publicly released, establishing a reproducible foundation and critical empirical baseline for developing graph-aware language models.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have achieved great success in various reasoning tasks. In this work, we focus on the graph reasoning ability of LLMs. Although theoretical studies proved that LLMs are capable of handling graph reasoning tasks, empirical evaluations reveal numerous failures. To deepen our understanding on this discrepancy, we revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these three fundamental tasks. Meanwhile, we perform a real-world investigation on knowledge graphs and make consistent observations with our findings. The codes and datasets are available.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Graph Reasoning Tasks

Performance Discrepancy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Graph Reasoning

Knowledge Graphs

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations