🤖 AI Summary
This work addresses the challenge of developing a general-purpose graph foundation model for heterogeneous graph data, where existing methods struggle to uniformly model multi-scale and cross-domain graph structures. We propose a random-walk-based node serialization technique that converts arbitrary graphs into tokenized node sequences compatible with Transformer architectures, coupled with a context-prediction self-supervised loss. We theoretically prove that this representation effectively discriminates between local neighborhood structures and global topological properties. The model employs a pure Transformer architecture, enabling end-to-end pretraining and graph-level representation aggregation. Trained at scale across diverse graph domains, it achieves state-of-the-art performance on downstream tasks—including node classification, link prediction, and graph classification—outperforming both conventional GNNs and prior graph pretraining approaches. This is the first empirical demonstration of a scalable, generalizable graph foundation model, validating its feasibility and broad transfer potential.
📝 Abstract
A foundation model like GPT elicits many emergent abilities, owing to the pre-training with broad inclusion of data and the use of the powerful Transformer architecture. While foundation models in natural languages are prevalent, can we build similar models for graphs? This paper describes an approach toward a graph foundation model that is pre-trained with diverse graph datasets by adapting the Transformer backbone. A central challenge toward this end is how a sequence model encodes graphs of varying sizes and from different domains. We propose representing a node as multiple random walks, such that the Transformer can extract node representations from sequences, which in turn form edge and graph representations. We develop a novel context prediction loss for these random walks and theoretically analyze their expressive power in distinguishing neighborhoods and graphs. We also demonstrate the pre-training of our model and its adaptation to downstream tasks, showcasing its potential as a foundation for processing and reasoning with graph-structured data.