🤖 AI Summary
This study addresses group fairness disparities in machine learning–based credit scoring by systematically evaluating fairness-aware methods in realistic financial settings. We conduct unified, reproducible experiments across multi-source credit datasets (German and UK Credit), establishing a comprehensive evaluation framework that encompasses preprocessing techniques (Reweighting, SMOTE-Fair), in-processing algorithms (Adversarial Debiasing, FairGBM), and multidimensional fairness metrics (Statistical Parity Difference, Equal Opportunity Difference, Average Odds Difference). Crucially, we propose the first fairness–utility joint assessment paradigm tailored to credit scoring. Results show that most fairness interventions significantly degrade model utility—reducing AUC by an average of 3.2%. In contrast, FairGBM achieves the best trade-off: under an EOD constraint of <0.05, it maintains AUC > 0.72, demonstrating the effectiveness of task-specific fair modeling for credit risk assessment.
📝 Abstract
Digitalization of credit scoring is an essential requirement for financial organizations and commercial banks, especially in the context of digital transformation. Machine learning techniques are commonly used to evaluate customers' creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets.