🤖 AI Summary
This study addresses the challenge of comparing the importance of distinct groups of predictors in classification problems by proposing a statistical inference framework based on Categorical Gini Correlation (CGC). The work establishes, for the first time, a unified theoretical foundation for testing differences in CGC between predictor groups of arbitrary dimensionality, heterogeneity, and potential dependence structures. It rigorously proves the asymptotic normality of the test statistic under the null hypothesis and its consistency under the alternative. Effective inference is achieved through a nonparametric bootstrap procedure. Extensive simulations and real-data analyses—including applications to breast cancer diagnosis and human activity recognition—demonstrate that the proposed framework achieves high statistical power and offers substantial practical utility.
📝 Abstract
This article proposes an inferential framework for comparing predictor importance in classification problems with categorical response variables. The approach is based on the categorical Gini correlation (CGC) proposed by Dang et al. (2020), a measure of dependence between numerical predictors and categorical outcomes. Predictor importance is evaluated by testing differences in CGCs across competing predictor groups. The proposed methodology accommodates predictors of arbitrary and unequal dimensions and allows for dependence between predictor groups. Asymptotic normality of the test statistic is established under both the null and alternative hypotheses, and the resulting test is shown to be consistent. In addition to deriving the asymptotic distribution, a nonparametric bootstrap procedure is developed as an alternative approach to inference. Simulation studies, along with applications to breast cancer and human activity recognition datasets, demonstrate the effectiveness of the proposed framework.