Robust Principal Components by Casewise and Cellwise Weighting

📅 2024-08-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses robust principal component analysis (PCA) for multivariate data contaminated by case-level outliers, cell-level outliers, and missing values. We propose cellPCA, a unified framework that jointly models row- and cell-level anomalies. Its key contributions are: (i) the first formulation coupling robust losses for both outlier types into a single optimization objective; (ii) a residual cell graph and enhanced anomaly graph enabling joint detection of case- and cell-level anomalies; and (iii) derivation of a dual-granularity influence function for the principal subspace, along with asymptotic distribution theory. The algorithm employs an iterative reweighted least squares scheme, ensuring both computational efficiency and statistical interpretability. Extensive experiments on synthetic and real-world datasets demonstrate that cellPCA achieves over 30% higher anomaly detection accuracy and reduces principal subspace estimation error by more than 45% compared to state-of-the-art robust PCA methods, significantly improving robustness against mixed anomalies and missing data.

Technology Category

Application Category

📝 Abstract
Principal component analysis (PCA) is a fundamental tool for analyzing multivariate data. Here the focus is on dimension reduction to the principal subspace, characterized by its projection matrix. The classical principal subspace can be strongly affected by the presence of outliers. Traditional robust approaches consider casewise outliers, that is, cases generated by an unspecified outlier distribution that differs from that of the clean cases. But there may also be cellwise outliers, which are suspicious entries that can occur anywhere in the data matrix. Another common issue is that some cells may be missing. This paper proposes a new robust PCA method, called cellPCA, that can simultaneously deal with casewise outliers, cellwise outliers, and missing cells. Its single objective function combines two robust loss functions, that together mitigate the effect of casewise and cellwise outliers. The objective function is minimized by an iteratively reweighted least squares (IRLS) algorithm. Residual cellmaps and enhanced outlier maps are proposed for outlier detection. The casewise and cellwise influence functions of the principal subspace are derived, and its asymptotic distribution is obtained. Extensive simulations and two real data examples illustrate the performance of cellPCA.
Problem

Research questions and friction points this paper is trying to address.

Develops robust PCA method handling casewise and cellwise outliers
Addresses missing data cells in multivariate dimension reduction
Mitigates outlier effects through combined robust loss functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining robust loss functions for outlier mitigation
Using IRLS algorithm for objective function minimization
Proposing residual cellmaps and enhanced outlier maps
🔎 Similar Papers
No similar papers found.
F
Fabio Centofanti
Department of Industrial Engineering, University of Naples Federico II, Naples, Italy
Mia Hubert
Mia Hubert
Professor of Statistics, KU Leuven
Robust statisticsOutlier detectionDepth
P
Peter J. Rousseeuw
Section of Statistics and Data Science, Department of Mathematics, KU Leuven, Belgium