🤖 AI Summary
Existing GPU-accelerated single-source shortest path (SSSP) algorithms suffer from high synchronization overhead, substantial memory latency, and poor adaptability to diverse graph structures due to their reliance on a single fixed queue. This work proposes a multi-level multi-queue (MLMQ) architecture that, for the first time, introduces a hierarchical queue design into SSSP computation. By aligning with the GPU’s memory hierarchy and parallelism characteristics, MLMQ achieves efficient queue scheduling through unified read-write primitives, a cache-aware cooperative queuing mechanism, and an input-aware dynamic configuration strategy. Evaluated across a variety of graph datasets, the proposed approach achieves average speedups ranging from 1.87× to 17.13× over the state-of-the-art methods, significantly expanding the optimization potential of SSSP on GPUs.
📝 Abstract
As one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.