🤖 AI Summary
To address the over-restrictiveness of the classical maximum biclique model on noisy bipartite graphs, this paper studies the problem of mining maximum edge *k*-defective bicliques—bipartite subgraphs missing at most *k* edges—and proves its NP-hardness. We propose the first exact algorithm based on a branch-and-bound framework, introducing novel pivot-based pruning, graph reduction techniques, and an adaptive tight upper bound. Our algorithm achieves the first subexponential worst-case time complexity *O*(*mβₖⁿ*), where *βₖ* < 2. Experiments on ten large-scale real-world datasets demonstrate that our method outperforms state-of-the-art approaches by up to three orders of magnitude in speed, significantly improving scalability and robustness. This advances practical subgraph modeling for applications such as fraud detection and community discovery.
📝 Abstract
The problem of identifying the maximum edge biclique in bipartite graphs has attracted considerable attention in bipartite graph analysis, with numerous real-world applications such as fraud detection, community detection, and online recommendation systems. However, real-world graphs may contain noise or incomplete information, leading to overly restrictive conditions when employing the biclique model. To mitigate this, we focus on a new relaxed subgraph model, called the $k$-defective biclique, which allows for up to $k$ missing edges compared to the biclique model. We investigate the problem of finding the maximum edge $k$-defective biclique in a bipartite graph, and prove that the problem is NP-hard. To tackle this computation challenge, we propose a novel algorithm based on a new branch-and-bound framework, which achieves a worst-case time complexity of $O(malpha_k^n)$, where $alpha_k<2$. We further enhance this framework by incorporating a novel pivoting technique, reducing the worst-case time complexity to $O(meta_k^n)$, where $eta_k<alpha_k$. To improve the efficiency, we develop a series of optimization techniques, including graph reduction methods, novel upper bounds, and a heuristic approach. Extensive experiments on 10 large real-world datasets validate the efficiency and effectiveness of the proposed approaches. The results indicate that our algorithms consistently outperform state-of-the-art algorithms, offering up to $1000 imes$ speedups across various parameter settings.