🤖 AI Summary
To address the dual challenges of online single-machine performance bottlenecks in distributed subgraph sampling and prohibitive storage/I/O overheads from offline precomputation in trillion-edge industrial graph training, this paper proposes the first co-scheduling architecture that jointly optimizes subgraph generation and in-memory graph learning. Built upon a distributed in-memory computing framework, the architecture integrates topology-aware sampling, pipelined subgraph construction, and asynchronous gradient synchronization—enabling fully in-memory, distributed real-time subgraph generation without external storage and eliminating precomputation entirely. Experiments demonstrate that our approach achieves 27× higher subgraph generation throughput than SQL-based methods and 1.3× that of GraphGen; supports per-iteration training on million-node graphs; and reduces I/O overhead to zero.
📝 Abstract
Graph-based computations are crucial in a wide range of applications, where graphs can scale to trillions of edges. To enable efficient training on such large graphs, mini-batch subgraph sampling is commonly used, which allows training without loading the entire graph into memory. However, existing solutions face significant trade-offs: online subgraph generation, as seen in frameworks like DGL and PyG, is limited to a single machine, resulting in severe performance bottlenecks, while offline precomputed subgraphs, as in GraphGen, improve sampling efficiency but introduce large storage overhead and high I/O costs during training. To address these challenges, we propose extbf{GraphGen+}, an integrated framework that synchronizes distributed subgraph generation with in-memory graph learning, eliminating the need for external storage while significantly improving efficiency. GraphGen+ achieves a extbf{27$ imes$} speedup in subgraph generation compared to conventional SQL-like methods and a extbf{1.3$ imes$} speedup over GraphGen, supporting training on 1 million nodes per iteration and removing the overhead associated with precomputed subgraphs, making it a scalable and practical solution for industry-scale graph learning.