Scalable Similarity Search over Large Attributed Bipartite Graphs

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing methods for similarity search on large-scale attributed bipartite graphs struggle to simultaneously achieve high structural modeling accuracy and computational scalability. To address this, we propose the Attribute-enhanced Heterogeneous Personalized PageRank (AHPP) model, which unifies higher-order structural proximity and node attribute similarity within a principled probabilistic framework. We further design two local push-based approximation algorithms with rigorous theoretical error bounds, drastically reducing both time and space complexity. By integrating random walks, local graph computation, and attribute-aware representation learning, AHPP achieves state-of-the-art performance across diverse real-world and synthetic benchmarks—outperforming 15 baseline methods. The approach supports efficient similarity search on graphs with up to ten million nodes and ten thousand attribute dimensions, demonstrating both high precision and strong scalability.

Technology Category

Application Category

📝 Abstract

Bipartite graphs are widely used to model relationships between entities of different types, where nodes are divided into two disjoint sets. Similarity search, a fundamental operation that retrieves nodes similar to a given query node, plays a crucial role in various real-world applications, including machine learning and graph clustering. However, existing state-of-the-art methods often struggle to accurately capture the unique structural properties of bipartite graphs or fail to incorporate the informative node attributes, leading to suboptimal performance. Besides, their high computational complexity limits scalability, making them impractical for large graphs with millions of nodes and tens of thousands of attributes. To overcome these challenges, we first introduce Attribute-augmented Hidden Personalized PageRank (AHPP), a novel random walk model designed to blend seamlessly both the higher-order bipartite structure proximity and attribute similarity. We then formulate the similarity search over attributed bipartite graphs as an approximate AHPP problem and propose two efficient push-style local algorithms with provable approximation guarantees. Finally, extensive experiments on real-world and synthetic datasets validate the effectiveness of AHPP and the efficiency of our proposed algorithms when compared with fifteen competitors.

Problem

Research questions and friction points this paper is trying to address.

Scalable similarity search on large attributed bipartite graphs

Accurate capture of bipartite structure and node attributes

Efficient algorithms for similarity search with approximation guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-augmented Hidden Personalized PageRank model

Push-style local algorithms with approximation guarantees

Efficient similarity search on large attributed bipartite graphs

🔎 Similar Papers

No similar papers found.