Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the challenge of developing a general-purpose graph foundation model for heterogeneous graph data, where existing methods struggle to uniformly model multi-scale and cross-domain graph structures. We propose a random-walk-based node serialization technique that converts arbitrary graphs into tokenized node sequences compatible with Transformer architectures, coupled with a context-prediction self-supervised loss. We theoretically prove that this representation effectively discriminates between local neighborhood structures and global topological properties. The model employs a pure Transformer architecture, enabling end-to-end pretraining and graph-level representation aggregation. Trained at scale across diverse graph domains, it achieves state-of-the-art performance on downstream tasks—including node classification, link prediction, and graph classification—outperforming both conventional GNNs and prior graph pretraining approaches. This is the first empirical demonstration of a scalable, generalizable graph foundation model, validating its feasibility and broad transfer potential.

Technology Category

Application Category

📝 Abstract

A foundation model like GPT elicits many emergent abilities, owing to the pre-training with broad inclusion of data and the use of the powerful Transformer architecture. While foundation models in natural languages are prevalent, can we build similar models for graphs? This paper describes an approach toward a graph foundation model that is pre-trained with diverse graph datasets by adapting the Transformer backbone. A central challenge toward this end is how a sequence model encodes graphs of varying sizes and from different domains. We propose representing a node as multiple random walks, such that the Transformer can extract node representations from sequences, which in turn form edge and graph representations. We develop a novel context prediction loss for these random walks and theoretically analyze their expressive power in distinguishing neighborhoods and graphs. We also demonstrate the pre-training of our model and its adaptation to downstream tasks, showcasing its potential as a foundation for processing and reasoning with graph-structured data.

Problem

Research questions and friction points this paper is trying to address.

Building a graph foundation model using Transformer architecture

Encoding graphs of varying sizes and different domains

Pre-training model for downstream graph-structured data tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-training Transformers with diverse graph datasets

Representing nodes via multiple random walks

Novel context prediction loss for random walks

🔎 Similar Papers

Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training on Industrial-Scale Data