Semi-Supervised Mixture Models under the Concept of Missing at Radom with Margin Confidence and Aranda Ordaz Function

πŸ“… 2026-01-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the estimation bias and degraded classification performance of semi-supervised Gaussian mixture models under missing-at-random (MAR) label scenarios. To mitigate these issues, the authors propose a novel approach that jointly models the data generation process and the label missing mechanism. Specifically, the probability of a label being missing is modeled as a function of classification uncertainty, quantified via marginal confidence. An Aranda–Ordaz link function is introduced to flexibly capture the asymmetric relationship between this uncertainty and the missingness probability. Parameter estimation and label imputation are carried out through an Expectation/Conditional Maximization (ECM) algorithm. Experimental results demonstrate that, under high proportions of MAR missing labels, the proposed method significantly improves classification accuracy and robustness, effectively alleviating the systematic bias induced by ignoring the missingness mechanism.

Technology Category

Application Category

πŸ“ Abstract
This paper presents a semi-supervised learning framework for Gaussian mixture modelling under a Missing at Random (MAR) mechanism. The method explicitly parameterizes the missingness mechanism by modelling the probability of missingness as a function of classification uncertainty. To quantify classification uncertainty, we introduce margin confidence and incorporate the Aranda Ordaz (AO) link function to flexibly capture the asymmetric relationships between uncertainty and missing probability. Based on this formulation, we develop an efficient Expectation Conditional Maximization (ECM) algorithm that jointly estimates all parameters appearing in both the Gaussian mixture model (GMM) and the missingness mechanism, and subsequently imputes the missing labels by a Bayesian classifier derived from the fitted mixture model. This method effectively alleviates the bias induced by ignoring the missingness mechanism while enhancing the robustness of semi-supervised learning. The resulting uncertainty-aware framework delivers reliable classification performance in realistic MAR scenarios with substantial proportions of missing labels.
Problem

Research questions and friction points this paper is trying to address.

Semi-Supervised Learning
Missing at Random
Gaussian Mixture Model
Classification Uncertainty
Label Missingness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised learning
Missing at Random
Gaussian Mixture Model
Aranda Ordaz link function
Margin confidence
πŸ”Ž Similar Papers
No similar papers found.
J
Jinyang Liao
School of Mathematics and Statistics, University of New South Wales, Sydney, Australia
Ziyang Lyu
Ziyang Lyu
University of New South Wales
Asymptotic AnalysisMixed ModelStatistical MethodologySemi-Supervised Learning