🤖 AI Summary
This work addresses the challenges of applying state space models (SSMs) to large-scale graph data, where non-sequential graph structures and high computational costs hinder effective deployment. To overcome these limitations, we propose a novel Graph State Space Model that integrates a graph context gating mechanism to dynamically modulate the aggregation of multi-hop neighborhood information. Furthermore, we introduce a cross-batch aggregation strategy to enhance both training efficiency and generalization on large graphs. Our approach seamlessly unifies SSMs, graph neural networks, and multi-hop context sampling, achieving significant performance gains over existing baselines on standard graph benchmarks. Theoretical analysis further demonstrates that the proposed cross-batch aggregation effectively reduces training error.
📝 Abstract
State space models (SSMs) have recently emerged for modeling long-range dependency in sequence data, with much simplified computational costs than modern alternatives, such as transformers. Advancing SMMs to graph structured data, especially for large graphs, is a significant challenge because SSMs are sequence models and the shear graph volumes make it very expensive to convert graphs as sequences for effective learning. In this paper, we propose COMBA to tackle large graph learning using state space models, with two key innovations: graph context gating and cross batch aggregation. Graph context refers to different hops of neighborhood for each node, and graph context gating allows COMBA to use such context to learn best control of neighbor aggregation. For each graph context, COMBA samples nodes as batches, and train a graph neural network (GNN), with information being aggregated cross batches, allowing COMBA to scale to large graphs. Our theoretical study asserts that cross-batch aggregation guarantees lower error than training GNN without aggregation. Experiments on benchmark networks demonstrate significant performance gains compared to baseline approaches. Code and benchmark datasets will be released for public access.