🤖 AI Summary
This paper addresses statistical discrimination driven by verifiable beliefs generated by machine learning models—distinct from traditional human biases.
Method: We propose a novel belief-state-based intervention mechanism that explicitly incorporates the structure of decision-makers’ beliefs, departing from belief-agnostic approaches such as affirmative action. Our framework integrates game theory, mechanism design, and interpretability analysis to impose precise constraints on the belief formation process.
Contribution/Results: The mechanism is provably robust under belief miscalibration and covariate shift. Theoretical analysis demonstrates its strict dominance over existing fairness interventions across multiple bias configurations—including label bias, feature bias, and compound bias. By grounding fairness interventions in the epistemic states of algorithmic decision-makers, our approach establishes a new paradigm for AI fairness regulation—one that balances theoretical rigor with practical implementability.
📝 Abstract
I study statistical discrimination driven by verifiable beliefs, such as those generated by machine learning, rather than by humans. When beliefs are verifiable, interventions against statistical discrimination can move beyond simple, belief-free designs like affirmative action, to more sophisticated ones, that constrain decision makers based on what they are thinking. Such mind reading interventions can perform well where affirmative action does not, even when the minds being read are biased. My theory of belief-contingent intervention design sheds light on influential methods of regulating machine learning, and yields novel interventions robust to covariate shift and incorrect, biased beliefs.