Model-based Clustering for Network Data via a Latent Shrinkage Position Cluster Model

📅 2023-10-05

📈 Citations: 1

✨ Influential: 0

career value

236K/year

🤖 AI Summary

To address the challenge of manually specifying both the latent space dimensionality and the number of clusters in low-dimensional representation and clustering of network data, this paper proposes a Bayesian nonparametric latent position model that jointly infers these two quantities in a fully adaptive manner. Methodologically, the model employs a shrinkage prior on latent variables to automatically identify the effective dimensionality, while integrating a sparse finite Gaussian mixture model (GMM) to adaptively determine the optimal number of clusters; full Bayesian inference is performed via Markov Chain Monte Carlo (MCMC). Experiments on synthetic datasets and real-world Twitter social networks—covering sports and political domains—demonstrate that the proposed model significantly outperforms baseline methods requiring pre-specified dimensions or cluster counts, achieving superior clustering accuracy and representation quality. The model retains strong statistical interpretability and practical usability. An open-source implementation is provided for immediate use.

📝 Abstract

Low-dimensional representation and clustering of network data are tasks of great interest across various fields. Latent position models are routinely used for this purpose by assuming that each node has a location in a low-dimensional latent space, and enabling node clustering. However, these models fall short in simultaneously determining the optimal latent space dimension and the number of clusters. Here we introduce the latent shrinkage position cluster model (LSPCM), which addresses this limitation. The LSPCM posits a Bayesian nonparametric shrinkage prior on the latent positions' variance parameters resulting in higher dimensions having increasingly smaller variances, aiding in the identification of dimensions with non-negligible variance. Further, the LSPCM assumes the latent positions follow a sparse finite Gaussian mixture model, allowing for automatic inference on the number of clusters related to non-empty mixture components. As a result, the LSPCM simultaneously infers the latent space dimensionality and the number of clusters, eliminating the need to fit and compare multiple models. The performance of the LSPCM is assessed via simulation studies and demonstrated through application to two real Twitter network datasets from sporting and political contexts. Open source software is available to promote widespread use of the LSPCM.

Problem

Research questions and friction points this paper is trying to address.

Simultaneously determining latent space dimension and cluster number

Addressing limitations in current latent position network models

Automatically inferring effective dimensions and cluster quantities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses infinite dimensional latent space with shrinkage priors

Employs sparse finite Gaussian mixture for clustering

Simultaneously infers latent dimension and cluster number

🔎 Similar Papers

Improved Community Detection using Stochastic Block Models