Optimal Graph Clustering without Edge Density Signals

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional graph clustering models—such as the Stochastic Block Model (SBM) and Degree-Corrected Block Model (DCBM)—fail under strong degree heterogeneity and vanishing edge density contrast between and within clusters. To address this, we propose the Popularity-Adjusted Block Model (PABM), which explicitly models node-specific local connection popularity. Theoretically, we prove that consistent clustering remains feasible even when global edge densities provide no discriminative signal—relying solely on inter-cluster popularity differences. Methodologically, we employ the top-$k^2$-dimensional spectral embedding to capture higher-order structural information inherent in PABM, and derive a tight bound on the optimal clustering error. Experiments on synthetic and real-world networks demonstrate that our approach significantly outperforms conventional $k$-dimensional spectral clustering, breaking dependence on edge density signals. This work establishes a novel theoretical framework and practical algorithm for clustering heterogeneous graphs.

Technology Category

Application Category

📝 Abstract
This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM), addressing limitations of existing models. In contrast to the Stochastic Block Model (SBM), which assumes uniform vertex degrees, and to the Degree-Corrected Block Model (DCBM), which applies uniform degree corrections across clusters, PABM introduces separate popularity parameters for intra- and inter-cluster connections. Our main contribution is the characterization of the optimal error rate for clustering under PABM, which provides novel insights on clustering hardness: we demonstrate that unlike SBM and DCBM, cluster recovery remains possible in PABM even when traditional edge-density signals vanish, provided intra- and inter-cluster popularity coefficients differ. This highlights a dimension of degree heterogeneity captured by PABM but overlooked by DCBM: local differences in connectivity patterns can enhance cluster separability independently of global edge densities. Finally, because PABM exhibits a richer structure, its expected adjacency matrix has rank between $k$ and $k^2$, where $k$ is the number of clusters. As a result, spectral embeddings based on the top $k$ eigenvectors may fail to capture important structural information. Our numerical experiments on both synthetic and real datasets confirm that spectral clustering algorithms incorporating $k^2$ eigenvectors outperform traditional spectral approaches.
Problem

Research questions and friction points this paper is trying to address.

Characterizing optimal clustering error rates under the Popularity-Adjusted Block Model
Enabling cluster recovery when traditional edge-density signals vanish
Addressing limitations of spectral embeddings using only top k eigenvectors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces separate intra- and inter-cluster popularity parameters
Enables clustering without traditional edge-density signals
Uses spectral clustering with k-squared eigenvectors for accuracy
🔎 Similar Papers
No similar papers found.