Beyond a Single Queue: Multi-Level-Multi-Queue as an Effective Design for SSSP problems on GPUs

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing GPU-accelerated single-source shortest path (SSSP) algorithms suffer from high synchronization overhead, substantial memory latency, and poor adaptability to diverse graph structures due to their reliance on a single fixed queue. This work proposes a multi-level multi-queue (MLMQ) architecture that, for the first time, introduces a hierarchical queue design into SSSP computation. By aligning with the GPU’s memory hierarchy and parallelism characteristics, MLMQ achieves efficient queue scheduling through unified read-write primitives, a cache-aware cooperative queuing mechanism, and an input-aware dynamic configuration strategy. Evaluated across a variety of graph datasets, the proposed approach achieves average speedups ranging from 1.87× to 17.13× over the state-of-the-art methods, significantly expanding the optimization potential of SSSP on GPUs.

Technology Category

Application Category

📝 Abstract

As one of the most fundamental problems in graph processing, the Single-Source Shortest Path (SSSP) problem plays a critical role in numerous application scenarios. However, existing GPU-based solutions remain inefficient, as they typically rely on a single, fixed queue design that incurs severe synchronization overhead, high memory latency, and poor adaptivity to diverse inputs. To address these inefficiencies, we propose MultiLevelMultiQueue (MLMQ), a novel data structure that distributes multiple queues across the GPU's multi-level parallelism and memory hierarchy. To realize MLMQ, we introduce a cache-like collaboration mechanism for efficient inter-queue coordination, and develop a modular queue design based on unified Read and Write primitives. Within this framework, we expand the optimization space by designing a set of GPU-friendly queues, composing them across multiple levels, and further providing an input-adaptive MLMQ configuration scheme. Our MLMQ design achieves average speedups of 1.87x to 17.13x over state-of-the-art implementations. Our code is open-sourced at https://github.com/Leo9660/MLMQ.git.

Problem

Research questions and friction points this paper is trying to address.

SSSP

GPU

queue design

synchronization overhead

memory latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

MultiLevelMultiQueue

GPU graph processing

SSSP