Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of developing a general-purpose graph foundation model for heterogeneous graph data, where existing methods struggle to uniformly model multi-scale and cross-domain graph structures. We propose a random-walk-based node serialization technique that converts arbitrary graphs into tokenized node sequences compatible with Transformer architectures, coupled with a context-prediction self-supervised loss. We theoretically prove that this representation effectively discriminates between local neighborhood structures and global topological properties. The model employs a pure Transformer architecture, enabling end-to-end pretraining and graph-level representation aggregation. Trained at scale across diverse graph domains, it achieves state-of-the-art performance on downstream tasks—including node classification, link prediction, and graph classification—outperforming both conventional GNNs and prior graph pretraining approaches. This is the first empirical demonstration of a scalable, generalizable graph foundation model, validating its feasibility and broad transfer potential.

Technology Category

Application Category

📝 Abstract
A foundation model like GPT elicits many emergent abilities, owing to the pre-training with broad inclusion of data and the use of the powerful Transformer architecture. While foundation models in natural languages are prevalent, can we build similar models for graphs? This paper describes an approach toward a graph foundation model that is pre-trained with diverse graph datasets by adapting the Transformer backbone. A central challenge toward this end is how a sequence model encodes graphs of varying sizes and from different domains. We propose representing a node as multiple random walks, such that the Transformer can extract node representations from sequences, which in turn form edge and graph representations. We develop a novel context prediction loss for these random walks and theoretically analyze their expressive power in distinguishing neighborhoods and graphs. We also demonstrate the pre-training of our model and its adaptation to downstream tasks, showcasing its potential as a foundation for processing and reasoning with graph-structured data.
Problem

Research questions and friction points this paper is trying to address.

Building a graph foundation model using Transformer architecture
Encoding graphs of varying sizes and different domains
Pre-training model for downstream graph-structured data tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-training Transformers with diverse graph datasets
Representing nodes via multiple random walks
Novel context prediction loss for random walks
🔎 Similar Papers
No similar papers found.
Ziyuan Tang
Ziyuan Tang
University of Minnesota
J
Jie Chen
MIT-IBM Watson AI Lab, IBM Research