Effective and Efficient Conductance-based Community Search at Billion Scale

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the overlooked issue of external sparsity in community search by formally introducing the Conductance-based Community Search (CCS) problem: given a query vertex, find the connected subgraph containing it that minimizes conductance—the ratio of cut edges to internal volume. CCS is proven NP-hard. To solve it efficiently, the authors propose SCCS, a four-stage algorithm: (1) local graph sampling to reduce scale; (2) seed selection; (3) gain-driven expansion; and (4) verification-based pruning—jointly optimizing internal cohesiveness and external sparsity. Extensive experiments on billion-scale real-world and synthetic graphs demonstrate that SCCS significantly outperforms state-of-the-art methods in effectiveness, efficiency, and scalability.

Technology Category

Application Category

📝 Abstract
Community search is a widely studied semi-supervised graph clustering problem, retrieving a high-quality connected subgraph containing the user-specified query vertex. However, existing methods primarily focus on cohesiveness within the community but ignore the sparsity outside the community, obtaining sub-par results. Inspired by this, we adopt the well-known conductance metric to measure the quality of a community and introduce a novel problem of conductance-based community search (CCS). CCS aims at finding a subgraph with the smallest conductance among all connected subgraphs that contain the query vertex. We prove that the CCS problem is NP-hard. To efficiently query CCS, a four-stage subgraph-conductance-based community search algorithm, SCCS, is proposed. Specifically, we first greatly reduce the entire graph using local sampling techniques. Then, a three-stage local optimization strategy is employed to continuously refine the community quality. Namely, we first utilize a seeding strategy to obtain an initial community to enhance its internal cohesiveness. Then, we iteratively add qualified vertices in the expansion stage to guarantee the internal cohesiveness and external sparsity of the community. Finally, we gradually remove unqualified vertices during the verification stage. Extensive experiments on real-world datasets containing one billion-scale graph and synthetic datasets show the effectiveness, efficiency, and scalability of our solutions.
Problem

Research questions and friction points this paper is trying to address.

Finding high-quality connected subgraphs using conductance metric
Addressing NP-hard conductance-based community search problem
Efficiently reducing graph size and refining community quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conductance metric for community quality
Local sampling reduces graph size efficiently
Three-stage local optimization refines community
🔎 Similar Papers
No similar papers found.
Longlong Lin
Longlong Lin
Southwest University
Graph Machine LearningGraph ClusteringSimilarity SearchLLM-based Graph Analysis
Yue He
Yue He
Tsinghua University
causal inference
W
Wei Chen
College of Computer and Information Science, Southwest University, China
P
Pingpeng Yuan
School of Computer Science and Technology, Huazhong University of Science and Technology, China
Rong-Hua Li
Rong-Hua Li
Beijing Institute of Technology
Algorithms for (big) graphmatrixand sequence data
T
Tao Jia
College of Computer and Information Science, Southwest University, China