Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the problem of recovering latent components and estimating the mixing matrix from unlabeled finite mixture data, where only distributions sharing the same latent components but with unknown mixing weights are observed. The authors propose a novel identifiability theory based on marginal independence, decoupling irreducibility from marginal independence for the first time. They develop the Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimation framework, which integrates affine combination analysis, the maximum mean discrepancy metric, and uniform convergence theory to achieve stable estimation. Experiments demonstrate that the proposed method significantly outperforms baseline approaches such as clustering and matrix factorization on both synthetic and flow cytometry data, with condition-aware representative selection substantially enhancing the accuracy and stability of component recovery.
📝 Abstract
We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.
Problem

Research questions and friction points this paper is trying to address.

unlabeled finite mixtures
marginal independence
component identifiability
mixing matrix estimation
latent components
Innovation

Methods, ideas, or system contributions that make the work stand out.

marginal independence
unlabeled finite mixtures
component identifiability
affine combinations
PM-MMD