Unsupervised Learning Under a General Semiparametric Clusterwise Elliptical Distribution: Efficient Estimation, Optimal Clustering, and Consistent Cluster Selection

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the problem of simultaneously identifying latent cluster structures, estimating model parameters, and selecting the number of clusters from continuous observations under an unsupervised setting. To this end, the authors propose a general semiparametric clustered elliptical distribution model and develop a two-stage algorithm: an initial clustering and parameter estimation is obtained via weighted least squares with a separation penalty, followed by alternating pseudo-maximum likelihood estimation and cluster reassignment. The method achieves, for the first time, asymptotic semiparametric efficiency, asymptotically optimal clustering accuracy, and consistent selection of the number of clusters under general semiparametric clustered elliptical distributions, thereby overcoming the restrictive Gaussian assumption commonly adopted in existing approaches. Theoretical analysis establishes the consistency and asymptotic efficiency of the estimators and the asymptotic optimality of clustering accuracy, while simulations and real-data analyses demonstrate superior finite-sample performance.
📝 Abstract
We introduce a general semiparametric clusterwise elliptical distribution to assess how latent cluster structure shapes continuous outcomes. Using a subjectwise representation, we first estimate cluster-specific mean vectors and a cluster-invariant scatter matrix by minimizing a weighted sum of squares criterion augmented with a separation penalty; we provide an initialization scheme and a computational algorithm with guaranteed convergence. This initial estimator consistently recovers the true clusters and seeds a second phase that alternates pseudo-maximum likelihood (or pseudo-maximum marginal likelihood) estimation with cluster reassignment, yielding asymptotic semiparametric efficiency and an optimal clustering that asymptotically maximizes the probability of correct membership. We also propose a semiparametric information criterion for selecting the number of clusters. Monte Carlo simulations and empirical applications demonstrate strong finite-sample performance and practical value.
Problem

Research questions and friction points this paper is trying to address.

unsupervised learning
cluster structure
semiparametric model
elliptical distribution
cluster selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

semiparametric clustering
elliptical distribution
optimal clustering
cluster selection
pseudo-maximum likelihood
🔎 Similar Papers
No similar papers found.
J
Jen-Chieh Teng
Data Science Degree Program, National Taiwan University, Taipei, Taiwan
S
Sheng-Hsin Fan
Department of Mathematics, National Taiwan University, Taipei, Taiwan
C
Chin-Tsang Chiang
Institute of Applied Mathematical Sciences, National Taiwan University, Taipei, Taiwan
M
Ming-Yueh Huang
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
Alvin Lim
Alvin Lim
Professor of Computer Science, Auburn University
Self-organizing sensor-actuator networksmobile and pervasive computingwireless networksreliable and dynamically reconfigur