MQ-GNN: A Multi-Queue Pipelined Architecture for Scalable and Efficient GNN Training

📅 2026-01-08

🏛️ IEEE Access

📈 Citations: 2

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the inefficiencies in graph neural network (GNN) training caused by inefficient minibatch generation, data transfer bottlenecks, and synchronization overhead across multiple GPUs, which collectively lead to low hardware utilization. To overcome these challenges, the authors propose MQ-GNN, a framework featuring a multi-queue pipelined architecture that interleaves different training stages. MQ-GNN introduces the Ready-to-Update asynchronous consistency model (RaCoM) to enable asynchronous gradient sharing with adaptive periodic synchronization. It further incorporates a global neighbor sampling cache and an adaptive queue sizing strategy to enhance throughput while preserving model consistency. Experimental results on four large-scale datasets demonstrate that MQ-GNN achieves up to 4.6× speedup over ten baseline models, improves GPU utilization by 30%, and maintains competitive accuracy.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) are powerful tools for learning graph-structured data, but their scalability is hindered by inefficient mini-batch generation, data transfer bottlenecks, and costly inter-GPU synchronization. Existing training frameworks fail to overlap these stages, leading to suboptimal resource utilization. This paper proposes MQ-GNN, a multi-queue pipelined framework that maximizes training efficiency by interleaving GNN training stages and optimizing resource utilization. MQ-GNN introduces Ready-to-Update Asynchronous Consistent Model (RaCoM), which enables asynchronous gradient sharing and model updates while ensuring global consistency through adaptive periodic synchronization. Additionally, it employs global neighbor sampling with caching to reduce data transfer overhead and an adaptive queue-sizing strategy to balance computation and memory efficiency. Experiments on four large-scale datasets and ten baseline models demonstrate that MQ-GNN achieves up to ${4.6\,\times }$ faster training time and 30% improved GPU utilization while maintaining competitive accuracy. These results establish MQ-GNN as a scalable and efficient solution for multi-GPU GNN training. The code is available at MQ-GNN.

Problem

Research questions and friction points this paper is trying to address.

Graph Neural Networks

scalability

multi-GPU training

data transfer bottleneck

inter-GPU synchronization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Queue Pipeline

Asynchronous Consistent Model

Global Neighbor Sampling