Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study addresses the arbitrariness in recidivism risk prediction arising from model multiplicity by leveraging a judicial system with over 15 years of operational history. The authors formalize legal rules into algorithmic labels to construct a high-quality dataset, train interpretable models, and analyze how structural diversity among models influences predictive disagreement. For the first time in a real-world judicial setting, they quantify the relationship between model multiplicity and prediction arbitrariness, establish a theoretical lower bound, and demonstrate that actual inter-model consistency substantially exceeds worst-case expectations. Innovatively adopting a “minimum risk score across multiple models” strategy, the approach simultaneously safeguards individual rights and reduces decision arbitrariness. The resulting models not only achieve superior predictive performance and more equitable error distributions across demographic groups but also effectively capture inmates’ rehabilitation progress.

📝 Abstract

Prediction tasks over individual futures, which are inherently noisy, often admit multiple similarly accurate models. When these models produce different predictions for the same individual, they raise concerns of arbitrariness in decision-making. How severe can this arbitrariness be, in theory and in practice? How can it be resolved to support high-stakes risk assessment? We address these questions through a study of a machine learning-based decision support system for recidivism risk assessment that has been in use for over 15 years. By translating complex legal rules into an algorithm for labeling post release outcomes (recidivist or non-recidivist), we first construct a dataset of thousands of inmate releases. Using this dataset, we learn interpretable models that improve predictive performance, reduce error-rate disparities between groups, and ensure that rehabilitative progress lowers risk scores. Next, we study predictive multiplicity, by first deriving a tight lower bound on the expected predictive agreement of any finite set of models over a dataset, and then by evaluating the extent to which structural diversity (e.g., different model coefficients) within this set translates to predictive multiplicity (i.e., different predictions for the same individual). Our experiments indicate that the existence of many similarly accurate models with comparable error-rate disparities does not necessarily translate into severe predictive multiplicity. Empirically, similarly performant models can exhibit substantially higher predictive agreement than worst-case theoretical guarantees suggest. We find that a simple policy that assigns each inmate the lowest risk among these models is effective for addressing predictive arbitrariness.

Problem

Research questions and friction points this paper is trying to address.

Model Multiplicity

Predictive Arbitrariness

Recidivism Risk Assessment

High-Stakes Decision-Making

Algorithmic Fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive multiplicity

recidivism risk assessment

interpretable machine learning