🤖 AI Summary
This study addresses the limited efficacy of existing bias mitigation techniques in predictive models trained on government data, which often fail to achieve fairness objectives. Using crime rate prediction by Bristol City Council as a case study, the work systematically investigates the root causes of this failure, attributing it to inherent structural and historical biases embedded in the data rather than flaws in model design. Through cross-cutting fairness experiments employing multiple mainstream predictive models and fairness interventions, the analysis examines dimensions such as data distribution shifts, accumulation of historical bias, and reporting delays. The findings reveal the inadequacy of approaches that address only single sensitive attributes and demonstrate that current technical methods are insufficient to eliminate the deeply entrenched inequities in governmental datasets, thereby offering critical empirical evidence and a cautionary insight for algorithmic deployment in public policy contexts.
📝 Abstract
The potential for bias and unfairness in AI-supporting government services raises ethical and legal concerns. Using crime rate prediction with the Bristol City Council data as a case study, we examine how these issues persist. Rather than auditing real-world deployed systems, our goal is to understand why widely adopted bias mitigation techniques often fail when applied to government data. Our findings reveal that bias mitigation approaches applied to government data are not always effective -- not because of flaws in model architecture or metric selection, but due to the inherent properties of the data itself. Through comparing a set of comprehensive models and fairness methods, our experiments consistently show that the mitigation efforts cannot overcome the embedded unfairness in the data -- further reinforcing that the origin of bias lies in the structure and history of government datasets. We then explore the reasons for the mitigation failures in predictive models on government data and highlight the potential sources of unfairness posed by data distribution shifts, the accumulation of historical bias, and delays in data release. We also discover the limitations of the blind spots in fairness analysis and bias mitigation methods when only targeting a single sensitive feature through a set of intersectional fairness experiments. Although this study is limited to one city, the findings are highly suggestive, which can contribute to an early warning that biases in government data may persist even with standard mitigation methods.