🤖 AI Summary
Existing criteria for rank estimation in noisy and outlier-contaminated matrices suffer either from insufficient robustness or high computational cost. This paper proposes the Density-Power-Divergence-based Criterion for Matrix Rank (DICMR), the first to incorporate density power divergence (DPD) into rank selection. DICMR achieves strong robustness—specifically, first-order B-robustness—while maintaining high computational efficiency via a closed-form solution that avoids iterative optimization, data splitting, or resampling. We formulate a model selection objective based on DPD and characterize the asymptotic probability of rank misestimation through rigorous theoretical analysis. Empirical evaluations demonstrate that DICMR attains accuracy comparable to robust cross-validation on principal component analysis and matrix completion tasks, yet with substantially reduced computational overhead. Moreover, on microarray data imputation, DICMR outperforms multiple state-of-the-art methods.
📝 Abstract
Estimating the true rank of a noisy data matrix is a fundamental problem underlying techniques such as principal component analysis, matrix completion, etc. Existing rank estimation criteria, including information-based and cross-validation methods, are either highly sensitive to outliers or computationally demanding when combined with robust estimators. This paper proposes a new criterion, the Divergence Information Criterion for Matrix Rank (DICMR), that achieves both robustness and computational simplicity. Derived from the density power divergence framework, DICMR inherits the robustness properties while being computationally very simple. We provide asymptotic bounds on its overestimation and underestimation probabilities, and demonstrate first-order B-robustness of the criteria. Extensive simulations show that DICMR delivers accuracy comparable to the robustified cross-validation methods, but with far lower computational cost. We also showcase a real-data application to microarray imputation to further demonstrate its practical utility, outperforming several state-of-the-art algorithms.