Simplicial clustering using the $α$--transformation

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Determining the optimal number of clusters and selecting the transformation parameter α remain challenging in compositional data clustering due to the lack of theoretical guidance for α-choices and the inherent constraints of the simplex space. Method: This paper introduces two novel simplex-space clustering methods—α-K-means and α-GMM—that systematically incorporate the α-transformation into clustering modeling for the first time. Both methods jointly optimize an information criterion (e.g., BIC) and a clustering validity index to enable adaptive, simultaneous selection of the number of clusters K and the transformation parameter α. Results: Extensive experiments on synthetic datasets and multiple real-world compositional datasets—including microbiome and geochemical compositions—demonstrate that the proposed methods significantly improve clustering accuracy and robustness over standard benchmarks such as log-ratio-transformed K-means and Dirichlet mixture models. The framework offers greater flexibility, statistical rigor, and interpretability for compositional data analysis.

Technology Category

Application Category

📝 Abstract

We introduce two simplicial clustering approaches for compositional data, that are adaptations of the $K$--means and of the Gaussian mixture models algorithms, by employing the $α$--transformation. By utilizing clustering validation indices we can decide on the number of clusters and choose the value of $α$ for the $K$--means, while for the model-based clustering approach information criteria complete this task. extensive simulation studies compare the performance of these two approaches and a real data set illustrates their performance in real world settings.

Problem

Research questions and friction points this paper is trying to address.

Develop simplicial clustering methods for compositional data

Adapt K-means and Gaussian mixture models using α-transformation

Determine optimal cluster number and α value through validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simplicial clustering with alpha-transformation

Adapted K-means and Gaussian mixture models

Clustering validation indices determine parameters

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection