🤖 AI Summary
Determining the optimal number of clusters and selecting the transformation parameter α remain challenging in compositional data clustering due to the lack of theoretical guidance for α-choices and the inherent constraints of the simplex space. Method: This paper introduces two novel simplex-space clustering methods—α-K-means and α-GMM—that systematically incorporate the α-transformation into clustering modeling for the first time. Both methods jointly optimize an information criterion (e.g., BIC) and a clustering validity index to enable adaptive, simultaneous selection of the number of clusters K and the transformation parameter α. Results: Extensive experiments on synthetic datasets and multiple real-world compositional datasets—including microbiome and geochemical compositions—demonstrate that the proposed methods significantly improve clustering accuracy and robustness over standard benchmarks such as log-ratio-transformed K-means and Dirichlet mixture models. The framework offers greater flexibility, statistical rigor, and interpretability for compositional data analysis.
📝 Abstract
We introduce two simplicial clustering approaches for compositional data, that are adaptations of the $K$--means and of the Gaussian mixture models algorithms, by employing the $α$--transformation. By utilizing clustering validation indices we can decide on the number of clusters and choose the value of $α$ for the $K$--means, while for the model-based clustering approach information criteria complete this task. extensive simulation studies compare the performance of these two approaches and a real data set illustrates their performance in real world settings.