Scalable Similarity Search over Large Attributed Bipartite Graphs

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for similarity search on large-scale attributed bipartite graphs struggle to simultaneously achieve high structural modeling accuracy and computational scalability. To address this, we propose the Attribute-enhanced Heterogeneous Personalized PageRank (AHPP) model, which unifies higher-order structural proximity and node attribute similarity within a principled probabilistic framework. We further design two local push-based approximation algorithms with rigorous theoretical error bounds, drastically reducing both time and space complexity. By integrating random walks, local graph computation, and attribute-aware representation learning, AHPP achieves state-of-the-art performance across diverse real-world and synthetic benchmarks—outperforming 15 baseline methods. The approach supports efficient similarity search on graphs with up to ten million nodes and ten thousand attribute dimensions, demonstrating both high precision and strong scalability.

Technology Category

Application Category

📝 Abstract
Bipartite graphs are widely used to model relationships between entities of different types, where nodes are divided into two disjoint sets. Similarity search, a fundamental operation that retrieves nodes similar to a given query node, plays a crucial role in various real-world applications, including machine learning and graph clustering. However, existing state-of-the-art methods often struggle to accurately capture the unique structural properties of bipartite graphs or fail to incorporate the informative node attributes, leading to suboptimal performance. Besides, their high computational complexity limits scalability, making them impractical for large graphs with millions of nodes and tens of thousands of attributes. To overcome these challenges, we first introduce Attribute-augmented Hidden Personalized PageRank (AHPP), a novel random walk model designed to blend seamlessly both the higher-order bipartite structure proximity and attribute similarity. We then formulate the similarity search over attributed bipartite graphs as an approximate AHPP problem and propose two efficient push-style local algorithms with provable approximation guarantees. Finally, extensive experiments on real-world and synthetic datasets validate the effectiveness of AHPP and the efficiency of our proposed algorithms when compared with fifteen competitors.
Problem

Research questions and friction points this paper is trying to address.

Scalable similarity search on large attributed bipartite graphs
Accurate capture of bipartite structure and node attributes
Efficient algorithms for similarity search with approximation guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-augmented Hidden Personalized PageRank model
Push-style local algorithms with approximation guarantees
Efficient similarity search on large attributed bipartite graphs
🔎 Similar Papers
No similar papers found.
X
Xi Ou
College of Computer and Information Science, Southwest University
Longlong Lin
Longlong Lin
Southwest University
Graph Machine LearningGraph ClusteringSimilarity SearchLLM-based Graph Analysis
Z
Zeli Wang
Chongqing University of Posts and Telecommunications
P
Pingpeng Yuan
School of Computer Science and Technology, Huazhong University of Science and Technology, China
Rong-Hua Li
Rong-Hua Li
Beijing Institute of Technology
Algorithms for (big) graphmatrixand sequence data