Beyond a Single Queue: Multi-Level-Multi-Queue as an Effective Design for SSSP problems on GPUs

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing GPU-accelerated single-source shortest path (SSSP) algorithms suffer from high synchronization overhead, substantial memory latency, and poor adaptability to diverse graph structures due to their reliance on a single fixed queue. This work proposes a multi-level multi-queue (MLMQ) architecture that, for the first time, introduces a hierarchical queue design into SSSP computation. By aligning with the GPU’s memory hierarchy and parallelism characteristics, MLMQ achieves efficient queue scheduling through unified read-write primitives, a cache-aware cooperative queuing mechanism, and an input-aware dynamic configuration strategy. Evaluated across a variety of graph datasets, the proposed approach achieves average speedups ranging from 1.87× to 17.13× over the state-of-the-art methods, significantly expanding the optimization potential of SSSP on GPUs.

Technology Category

Application Category

📝 Abstract
As one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.
Problem

Research questions and friction points this paper is trying to address.

SSSP
GPU
queue design
synchronization overhead
memory latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

MultiLevelMultiQueue
GPU graph processing
SSSP
multi-queue design
input-adaptive optimization
🔎 Similar Papers
No similar papers found.
Z
Zhengding Hu
University of Science and Technology of China
J
Jingwen Sun
University of Science and Technology of China
L
Le Jiang
University of Science and Technology of China
Y
Yuhao Wang
University of Science and Technology of China
J
Junqing Lin
University of Science and Technology of China
Yi Zong
Yi Zong
School of Computer Science, Fudan University
NLP
G
Guangzhong Sun
University of Science and Technology of China