🤖 AI Summary
To address redundancy, neighborhood explosion, and high inter-node communication overhead in streaming GNN inference on dynamic graphs, this paper proposes the first real-time incremental GNN inference framework. Methodologically, it introduces: (1) a general-purpose incremental programming model grounded in the mathematical properties of aggregation functions, enabling exact and provably correct embedding updates; and (2) a unified single-machine and distributed incremental execution paradigm integrating local influence propagation, sensitivity-driven pruning, streaming graph partitioning, and lightweight synchronization. Experiments show that the single-machine version achieves throughputs of 28K/s (Arxiv) and 1.2K/s (Products), with end-to-end latency ranging from 0.1 ms to 1 s. The distributed variant improves throughput by 30× and reduces cross-node communication overhead by 70×, significantly outperforming existing vertex-wise and layer-wise approaches.
📝 Abstract
Most real-world graphs are dynamic in nature, with continuous and rapid updates to the graph topology, and vertex and edge properties. Such frequent updates pose significant challenges for inferencing over Graph Neural Networks (GNNs). Current approaches that perform vertex-wise and layer-wise inferencing are impractical for dynamic graphs as they cause redundant computations, expand to large neighborhoods, and incur high communication costs for distributed setups, resulting in slow update propagation that often exceeds real-time latency requirements. This motivates the need for streaming GNN inference frameworks that are efficient and accurate over large, dynamic graphs. We propose Ripple, a framework that performs fast incremental updates of embeddings arising due to updates to the graph topology or vertex features. Ripple provides a generalized incremental programming model, leveraging the properties of the underlying aggregation functions employed by GNNs to efficiently propagate updates to the affected neighborhood and compute the exact new embeddings. Besides a single-machine design, we also extend this execution model to distributed inferencing, to support large graphs that do not fit in a single machine's memory. Ripple on a single machine achieves up to $approx28000$ updates/sec for sparse graphs like Arxiv and $approx1200$ updates/sec for larger and denser graphs like Products, with latencies of $0.1$ms--$1$s that are required for near-realtime applications. The distributed version of Ripple offers up to $approx30 imes$ better throughput over the baselines, due to $70 imes$ lower communication costs during updates.