🤖 AI Summary
Exact counting of graph patterns in large-scale graphs is computationally prohibitive, and existing approximation methods rely on uniform sampling, which struggles to efficiently handle complex and diverse patterns. This work proposes AGIS, a system that introduces, for the first time, a pattern-structure-aware non-uniform sampling distribution. By integrating a structure-aware neighbor sampling strategy, an algorithm to approximate the ideal sampling distribution, and an adaptive scheduling mechanism, AGIS enables highly efficient approximate counting for arbitrary graph patterns. Compared to the state-of-the-art systems, AGIS achieves a 28.5× speedup on geometric mean and exceeds 100,000× acceleration in specific scenarios. It is the first system capable of supporting graphs with tens of billions of edges and delivering high-accuracy estimates within seconds.
📝 Abstract
Approximate Graph Pattern Mining (AGPM) is essential for analyzing large-scale graphs where exact counting is computationally prohibitive. While there exist numerous sampling-based AGPM systems, they all rely on uniform sampling and overlook the underlying probability distribution. This limitation restricts their scalability to a broader range of patterns. In this paper, we introduce AGIS, an extremely fast AGPM system capable of counting arbitrary patterns from huge graphs. AGIS employs structure-informed neighbor sampling, a novel sampling technique that deviates from uniformness but allocates specific sampling probabilities based on the pattern structure. We first derive the ideal sampling distribution for AGPM and then present a practical method to approximate it. Furthermore, we develop a method that balances convergence speed and computational overhead, determining when to use the approximated distribution. Experimental results demonstrate that AGIS significantly outperforms the state-of-the-art AGPM system, achieving 28.5x geometric mean speedup and more than 100,000x speedup in specific cases. Furthermore, AGIS is the only AGPM system that scales to graphs with tens of billions of edges and robustly handles diverse patterns, successfully providing accurate estimates within seconds. We will open-source AGIS to encourage further research in this field.