Graph-based Semi-supervised and Unsupervised Methods for Local Clustering

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses local substructure (i.e., local clustering) discovery in large-scale graphs under label-scarce (semi-supervised) or label-free (unsupervised) settings. We propose the first unified, sparsity-driven framework that jointly models both scenarios: it leverages random graph sampling, localized diffusion propagation, and node co-membership analysis to efficiently identify local clusters. Theoretically, we establish correctness guarantees and prove solvability via a linear system rooted in the graph Laplacian. Crucially, our method avoids global optimization, entails low computational overhead, and significantly outperforms existing local clustering algorithms—achieving state-of-the-art performance even when labeled nodes constitute only 0.1% of the graph.

Technology Category

Application Category

📝 Abstract
Local clustering aims to identify specific substructures within a large graph without requiring full knowledge of the entire graph. These substructures are typically small compared to the overall graph, enabling the problem to be approached by finding a sparse solution to a linear system associated with the graph Laplacian. In this work, we first propose a method for identifying specific local clusters when very few labeled data is given, which we term semi-supervised local clustering. We then extend this approach to the unsupervised setting when no prior information on labels is available. The proposed methods involve randomly sampling the graph, applying diffusion through local cluster extraction, then examining the overlap among the results to find each cluster. We establish the co-membership conditions for any pair of nodes and rigorously prove the correctness of our methods. Additionally, we conduct extensive experiments to demonstrate that the proposed methods achieve state-of-the-arts results in the low-label rates regime.
Problem

Research questions and friction points this paper is trying to address.

Identifying local clusters in large graphs with minimal labeled data
Extending local clustering to unsupervised settings without label information
Achieving state-of-the-art results in low-label rate scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based semi-supervised local clustering method
Unsupervised clustering via random graph sampling
Diffusion and overlap analysis for cluster extraction
🔎 Similar Papers
No similar papers found.
Z
Zhaiming Shen
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332
Sung Ha Kang
Sung Ha Kang
School of Mathematics, Georgia Institute of Technology