Model-based Clustering for Network Data via a Latent Shrinkage Position Cluster Model

📅 2023-10-05
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of manually specifying both the latent space dimensionality and the number of clusters in low-dimensional representation and clustering of network data, this paper proposes a Bayesian nonparametric latent position model that jointly infers these two quantities in a fully adaptive manner. Methodologically, the model employs a shrinkage prior on latent variables to automatically identify the effective dimensionality, while integrating a sparse finite Gaussian mixture model (GMM) to adaptively determine the optimal number of clusters; full Bayesian inference is performed via Markov Chain Monte Carlo (MCMC). Experiments on synthetic datasets and real-world Twitter social networks—covering sports and political domains—demonstrate that the proposed model significantly outperforms baseline methods requiring pre-specified dimensions or cluster counts, achieving superior clustering accuracy and representation quality. The model retains strong statistical interpretability and practical usability. An open-source implementation is provided for immediate use.
📝 Abstract
Low-dimensional representation and clustering of network data are tasks of great interest across various fields. Latent position models are routinely used for this purpose by assuming that each node has a location in a low-dimensional latent space, and enabling node clustering. However, these models fall short in simultaneously determining the optimal latent space dimension and the number of clusters. Here we introduce the latent shrinkage position cluster model (LSPCM), which addresses this limitation. The LSPCM posits a Bayesian nonparametric shrinkage prior on the latent positions' variance parameters resulting in higher dimensions having increasingly smaller variances, aiding in the identification of dimensions with non-negligible variance. Further, the LSPCM assumes the latent positions follow a sparse finite Gaussian mixture model, allowing for automatic inference on the number of clusters related to non-empty mixture components. As a result, the LSPCM simultaneously infers the latent space dimensionality and the number of clusters, eliminating the need to fit and compare multiple models. The performance of the LSPCM is assessed via simulation studies and demonstrated through application to two real Twitter network datasets from sporting and political contexts. Open source software is available to promote widespread use of the LSPCM.
Problem

Research questions and friction points this paper is trying to address.

Simultaneously determining latent space dimension and cluster number
Addressing limitations in current latent position network models
Automatically inferring effective dimensions and cluster quantities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses infinite dimensional latent space with shrinkage priors
Employs sparse finite Gaussian mixture for clustering
Simultaneously infers latent dimension and cluster number
X
Xian Yao Gwee
School of Mathematics and Statistics, University College Dublin
I
I. C. Gormley
School of Mathematics and Statistics, University College Dublin
Michael Fop
Michael Fop
Lecturer/Assistant Professor University College Dublin
Statistics