🤖 AI Summary
This study addresses racial fairness in neuroimaging normative modeling, exposing clinical misclassification risks arising from demographic mismatch between reference populations and target cohorts. Methodologically, it integrates normative modeling, multivariate regression, interpretable bias analysis, and counterfactual evaluation. Critically, it introduces “demographic alignment” as a core fairness dimension and pioneers the use of bias scores—derived from normative residuals—to reverse-predict self-reported race, thereby quantifying systemic racial bias in a multivariate framework. Empirical evaluation across multiple state-of-the-art structural brain imaging normative models consistently reveals significant racial bias; notably, increasing sample size alone fails to mitigate this bias. Results demonstrate that merely including race as a covariate is insufficient for fairness. Instead, achieving equitable inference necessitates constructing reference populations with enhanced demographic representativeness and stratified, population-specific calibration.
📝 Abstract
Reference classes in healthcare establish healthy norms, such as pediatric growth charts of height and weight, and are used to chart deviations from these norms which represent potential clinical risk. How the demographics of the reference class influence clinical interpretation of deviations is unknown. Using normative modeling, a method for building reference classes, we evaluate the fairness (racial bias) in reference models of structural brain images that are widely used in psychiatry and neurology. We test whether including race in the model creates fairer models. We predict self-reported race using the deviation scores from three different reference class normative models, to better understand bias in an integrated, multivariate sense. Across all of these tasks, we uncover racial disparities that are not easily addressed with existing data or commonly used modeling techniques. Our work suggests that deviations from the norm could be due to demographic mismatch with the reference class, and assigning clinical meaning to these deviations should be done with caution. Our approach also suggests that acquiring more representative samples is an urgent research priority.