🤖 AI Summary
Machine learning models often exhibit heterogeneous performance across subpopulations (e.g., intersections of gender and marital status), posing risks to fairness and reliability.
Method: This paper proposes the first ROC/PR AUC-based framework for detecting anomalous models—i.e., statistically significant subgroups with degraded predictive performance. It integrates heuristic pruning search, class-imbalance correction, redundancy suppression, and FDR-controlled permutation testing to efficiently and robustly identify interpretable anomalous subgroups (e.g., *sex=female ∧ marital_status=married*).
Contribution/Results: Evaluated on diverse real-world datasets spanning clinical and economic domains, the framework achieves high computational efficiency and strong interpretability, accurately pinpointing critical anomalous subgroups. It supports safety-critical deployment decisions and targeted data augmentation. An open-source implementation enables plug-and-play analysis.
📝 Abstract
Machine learning (ML) is increasingly employed in real-world applications like medicine or economics, thus, potentially affecting large populations. However, ML models often do not perform homogeneously across such populations resulting in subgroups of the population (e.g., sex=female AND marital_status=married) where the model underperforms or, conversely, is particularly accurate. Identifying and describing such subgroups can support practical decisions on which subpopulation a model is safe to deploy or where more training data is required. The potential of identifying and analyzing such subgroups has been recognized, however, an efficient and coherent framework for effective search is missing. Consequently, we introduce SubROC, an open-source, easy-to-use framework based on Exceptional Model Mining for reliably and efficiently finding strengths and weaknesses of classification models in the form of interpretable population subgroups. SubROC incorporates common evaluation measures (ROC and PR AUC), efficient search space pruning for fast exhaustive subgroup search, control for class imbalance, adjustment for redundant patterns, and significance testing. We illustrate the practical benefits of SubROC in case studies as well as in comparative analyses across multiple datasets.