🤖 AI Summary
Existing GPU/CPU frameworks struggle to efficiently support dynamic-graph random walks, as runtime-varying transition probabilities invalidate precomputation and necessitate labor-intensive manual tuning. This paper introduces the first general-purpose GPU framework tailored for dynamic random walks. Its core contributions are: (1) high-performance rejection and reservoir sampling kernels without global reduction, eliminating redundant memory accesses and unnecessary random number generation; (2) a lightweight first-order cost model enabling node-level, runtime adaptive strategy selection; and (3) compile-time specialization and automatic code generation for user-defined logic. Evaluated on real-world datasets, our framework achieves geometric mean speedups of 73.44× over the best CPU baseline and 5.91× over the best GPU baseline. Moreover, it is the first system to support dynamic walk workloads previously intractable for conventional frameworks.
📝 Abstract
Dynamic random walks are fundamental to various graph analysis applications, offering advantages by adapting to evolving graph properties. Their runtime-dependent transition probabilities break down the pre-computation strategy that underpins most existing CPU and GPU static random walk optimizations. This leaves practitioners suffering from suboptimal frameworks and having to write hand-tuned kernels that do not adapt to workload diversity. To handle this issue, we present FlexiWalker, the first GPU framework that delivers efficient, workload-generic support for dynamic random walks. Our design-space study shows that rejection sampling and reservoir sampling are more suitable than other sampling techniques under massive parallelism. Thus, we devise (i) new high-performance kernels for them that eliminate global reductions, redundant memory accesses, and random-number generation. Given the necessity of choosing the best-fitting sampling strategy at runtime, we adopt (ii) a lightweight first-order cost model that selects the faster kernel per node at runtime. To enhance usability, we introduce (iii) a compile-time component that automatically specializes user-supplied walk logic into optimized building blocks. On various dynamic random walk workloads with real-world graphs, FlexiWalker outperforms the best published CPU/GPU baselines by geometric means of 73.44x and 5.91x, respectively, while successfully executing workloads that prior systems cannot support. We open-source FlexiWalker in https://github.com/AIS-SNU/FlexiWalker.