GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Graph Foundation Models (GFMs) are hindered by the scarcity of real-world graph data. To address this low-data regime, we propose the first multi-agent graph synthesis framework specifically designed for data-scarce scenarios. Our method orchestrates four specialized large language model (LLM) agents in a multi-stage iterative refinement process, jointly modeling semantic richness, topological validity, and high-fidelity textual node/edge attributes under explicit graph-structural constraints. We introduce “Sub”, a novel low-data benchmark variant, and design an interpretable evaluation framework integrating human assessment with Grassmann manifold-based semantic consistency metrics. Extensive experiments across six standard graph benchmarks—under their low-data subsets—demonstrate that our approach significantly outperforms conventional graph synthesis methods. The generated graphs provide reliable, controllable, and rigorously evaluable synthetic data to advance GFM training in resource-constrained settings.

Technology Category

Application Category

📝 Abstract
The era of foundation models has revolutionized AI research, yet Graph Foundation Models (GFMs) remain constrained by the scarcity of large-scale graph corpora. Traditional graph data synthesis techniques primarily focus on simplistic structural operations, lacking the capacity to generate semantically rich nodes with meaningful textual attributes: a critical limitation for real-world applications. While large language models (LLMs) demonstrate exceptional text generation capabilities, their direct application to graph synthesis is impeded by context window limitations, hallucination phenomena, and structural consistency challenges. To address these issues, we introduce GraphMaster, the first multi-agent framework specifically designed for graph data synthesis in data-limited environments. GraphMaster orchestrates four specialized LLM agents (Manager, Perception, Enhancement, and Evaluation) that collaboratively optimize the synthesis process through iterative refinement, ensuring both semantic coherence and structural integrity. To rigorously evaluate our approach, we create new data-limited"Sub"variants of six standard graph benchmarks, specifically designed to test synthesis capabilities under realistic constraints. Additionally, we develop a novel interpretability assessment framework that combines human evaluation with a principled Grassmannian manifold-based analysis, providing both qualitative and quantitative measures of semantic coherence. Experimental results demonstrate that GraphMaster significantly outperforms traditional synthesis methods across multiple datasets, establishing a strong foundation for advancing GFMs in data-scarce environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of large-scale graph corpora for GFMs
Overcomes LLM limitations in graph synthesis tasks
Ensures semantic and structural integrity in synthesized graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM framework for graph synthesis
Iterative refinement ensures semantic and structural integrity
Novel interpretability assessment with Grassmannian manifold analysis
🔎 Similar Papers
No similar papers found.