🤖 AI Summary
This paper addresses the Top-k s-biplex mining problem on large-scale bipartite graphs, where an s-biplex requires each vertex to miss at most s neighbors from the opposite partition. The problem is NP-hard, and exhaustive enumeration of all s-biplexes is computationally infeasible. To tackle it, we formally define the Top-k Bipartite s-biplex Search (TBS) problem and propose MVBP, a branch-and-bound algorithm. MVBP incorporates three novel acceleration techniques: 2-hop decomposition, one-sided pruning bounds, and progressive search—collectively reducing the theoretical time complexity to O*(γₛᵈ²), where γₛ < 2 and d₂ ≪ |V|. Experiments on eight real-world and synthetic datasets—including AmazonRatings with over 3 million vertices—demonstrate that FastMVBP achieves up to three orders-of-magnitude speedup over state-of-the-art baselines; notably, d₂ remains as low as 67, substantially enhancing scalability and practical applicability.
📝 Abstract
In a bipartite graph, a subgraph is an $s$-biplex if each vertex of the subgraph is adjacent to all but at most $s$ vertices on the opposite set. The enumeration of $s$-biplexes from a given graph is a fundamental problem in bipartite graph analysis. However, in real-world data engineering, finding all $s$-biplexes is neither necessary nor computationally affordable. A more realistic problem is to identify some of the largest $s$-biplexes from the large input graph. We formulate the problem as the {em top-$k$ $s$-biplex search (TBS) problem}, which aims to find the top-$k$ maximal $s$-biplexes with the most vertices, where $k$ is an input parameter. We prove that the TBS problem is NP-hard for any fixed $kge 1$. Then, we propose a branching algorithm, named MVBP, that breaks the simple $2^n$ enumeration algorithm. Furthermore, from a practical perspective, we investigate three techniques to improve the performance of MVBP: 2-hop decomposition, single-side bounds, and progressive search. Complexity analysis shows that the improved algorithm, named FastMVBP, has a running time $O^*(gamma_s^{d_2})$, where $gamma_s<2$, and $d_2$ is a parameter much smaller than the number of vertex in the sparse real-world graphs, e.g. $d_2$ is only $67$ in the AmazonRatings dataset which has more than $3$ million vertices. Finally, we conducted extensive experiments on eight real-world and synthetic datasets to demonstrate the empirical efficiency of the proposed algorithms. In particular, FastMVBP outperforms the benchmark algorithms by up to three orders of magnitude in several instances.