🤖 AI Summary
Deep ensembles improve predictive performance, yet their impact on algorithmic fairness remains poorly understood. This paper first systematically identifies a “differential gain effect”: protected groups (e.g., gender, race) experience unequal performance improvements in ensemble predictions, rooted in significant inter-group disparities in predictive diversity. To address this, we propose a post-processing fairness optimization method based on predictive distribution calibration, which mitigates group-level performance gaps while preserving high overall accuracy. Extensive experiments across multiple face analysis and medical imaging datasets confirm the ubiquity of the differential gain effect. Our method substantially improves statistical parity and equal opportunity—key fairness metrics—without compromising aggregate model performance. This work establishes the first principled characterization of fairness trade-offs in deep ensembles and provides an effective, implementation-friendly mitigation strategy grounded in distributional calibration.
📝 Abstract
Ensembles of Deep Neural Networks, Deep Ensembles, are widely used as a simple way to boost predictive performance. However, their impact on algorithmic fairness is not well understood yet. Algorithmic fairness investigates how a model's performance varies across different groups, typically defined by protected attributes such as age, gender, or race. In this work, we investigate the interplay between the performance gains from Deep Ensembles and fairness. Our analysis reveals that they unevenly favor different groups in what we refer to as a disparate benefits effect. We empirically investigate this effect with Deep Ensembles applied to popular facial analysis and medical imaging datasets, where protected group attributes are given and find that it occurs for multiple established group fairness metrics, including statistical parity and equal opportunity. Furthermore, we identify the per-group difference in predictive diversity of ensemble members as the potential cause of the disparate benefits effect. Finally, we evaluate different approaches to reduce unfairness due to the disparate benefits effect. Our findings show that post-processing is an effective method to mitigate this unfairness while preserving the improved performance of Deep Ensembles.