🤖 AI Summary
Automated conversion of tabular data into graph structures lacks standardized methodologies, dedicated benchmarks, and robust graph-generation capabilities. Method: This paper formalizes graph construction as a learnable task for the first time and proposes AutoG, a fully automated graph schema generation framework powered by large language models (LLMs). AutoG eliminates reliance on handcrafted rules or domain expertise by integrating LLM-driven graph-schema reasoning with structured prompt engineering. Contribution/Results: We introduce the first benchmark dataset specifically designed for graph construction. Experiments demonstrate that graphs generated by AutoG match or approach the performance of expert-designed graphs across multiple downstream tasks, yielding significant accuracy improvements for graph neural networks (GNNs). These results empirically validate the critical impact of high-quality graph structure on GNN performance.
📝 Abstract
Recent years have witnessed significant advancements in graph machine learning (GML), with its applications spanning numerous domains. However, the focus of GML has predominantly been on developing powerful models, often overlooking a crucial initial step: constructing suitable graphs from common data formats, such as tabular data. This construction process is fundamental to applying graphbased models, yet it remains largely understudied and lacks formalization. Our research aims to address this gap by formalizing the graph construction problem and proposing an effective solution. We identify two critical challenges to achieve this goal: 1. The absence of dedicated datasets to formalize and evaluate the effectiveness of graph construction methods, and 2. Existing automatic construction methods can only be applied to some specific cases, while tedious human engineering is required to generate high-quality graphs. To tackle these challenges, we present a two-fold contribution. First, we introduce a set of datasets to formalize and evaluate graph construction methods. Second, we propose an LLM-based solution, AutoG, automatically generating high-quality graph schemas without human intervention. The experimental results demonstrate that the quality of constructed graphs is critical to downstream task performance, and AutoG can generate high-quality graphs that rival those produced by human experts.