🤖 AI Summary
Existing methods for similarity search on large-scale attributed bipartite graphs struggle to simultaneously achieve high structural modeling accuracy and computational scalability. To address this, we propose the Attribute-enhanced Heterogeneous Personalized PageRank (AHPP) model, which unifies higher-order structural proximity and node attribute similarity within a principled probabilistic framework. We further design two local push-based approximation algorithms with rigorous theoretical error bounds, drastically reducing both time and space complexity. By integrating random walks, local graph computation, and attribute-aware representation learning, AHPP achieves state-of-the-art performance across diverse real-world and synthetic benchmarks—outperforming 15 baseline methods. The approach supports efficient similarity search on graphs with up to ten million nodes and ten thousand attribute dimensions, demonstrating both high precision and strong scalability.
📝 Abstract
Bipartite graphs are widely used to model relationships between entities of different types, where nodes are divided into two disjoint sets. Similarity search, a fundamental operation that retrieves nodes similar to a given query node, plays a crucial role in various real-world applications, including machine learning and graph clustering. However, existing state-of-the-art methods often struggle to accurately capture the unique structural properties of bipartite graphs or fail to incorporate the informative node attributes, leading to suboptimal performance. Besides, their high computational complexity limits scalability, making them impractical for large graphs with millions of nodes and tens of thousands of attributes. To overcome these challenges, we first introduce Attribute-augmented Hidden Personalized PageRank (AHPP), a novel random walk model designed to blend seamlessly both the higher-order bipartite structure proximity and attribute similarity. We then formulate the similarity search over attributed bipartite graphs as an approximate AHPP problem and propose two efficient push-style local algorithms with provable approximation guarantees. Finally, extensive experiments on real-world and synthetic datasets validate the effectiveness of AHPP and the efficiency of our proposed algorithms when compared with fifteen competitors.