OMEGA: A Low-Latency GNN Serving System for Large Graphs

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address high inference latency, substantial resource overhead, and accuracy degradation in Graph Neural Network (GNN) inference on large-scale graphs, this paper proposes a low-latency, high-accuracy GNN serving framework. Our method introduces two key innovations: (1) a novel selective recomputation mechanism that dynamically recomputes only critical precomputed embeddings on-demand—eliminating the need for full embedding caching and significantly reducing memory footprint and data transfer overhead; and (2) a cross-machine collaborative computation graph parallel execution framework enabling fine-grained, operator-level distributed scheduling and pipelined execution. Evaluated on large real-world graphs—including Reddit and OGBN-Products—the framework achieves up to 42–68% end-to-end inference latency reduction compared to state-of-the-art systems (e.g., DGL, PyTorch Geometric), while preserving model accuracy with negligible loss (<0.3%). The approach demonstrates strong scalability and practical effectiveness across diverse large-scale graph benchmarks.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets. However, serving GNNs on large graphs is challenging due to the high communication, computation, and memory overheads of constructing and executing computation graphs, which represent information flow across large neighborhoods. Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss. To this end, we propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss through two key ideas. First, OMEGA employs selective recomputation of precomputed embeddings, which allows for reusing precomputed computation subgraphs while selectively recomputing a small fraction to minimize accuracy loss. Second, we develop computation graph parallelism, which reduces communication overhead by parallelizing the creation and execution of computation graphs across machines. Our evaluation with large graph datasets and GNN models shows that OMEGA significantly outperforms state-of-the-art techniques.

Problem

Research questions and friction points this paper is trying to address.

Graph Neural Networks

Large-scale Graph Data

Computational Resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

OMEGA system

Distributed Processing

Graph Neural Networks (GNNs)

🔎 Similar Papers

Survey on Characterizing and Understanding GNNs from a Computer Architecture Perspective