🤖 AI Summary
This work addresses the challenge of automatically extracting empirically testable theoretical challenges from machine learning models to expose latent deficiencies in normative theories—such as Expected Utility Theory.
Method: We propose a “Theory–Falsifier” game-theoretic framework, modeling anomalous instance generation as an adversarial optimization process: a neural network selects behavioral predictors, while an adversary module synthesizes decision contexts that violate theoretical predictions; equilibrium solving enables automatic discovery of theory boundaries. The approach integrates differentiable behavioral modeling, adversarial training, and closed-loop experimental validation.
Contribution/Results: Our method successfully reproduces canonical anomalies—including the Allais paradox—and discovers novel effects. Controlled behavioral experiments confirm that human violation rates match those of established behavioral phenomena, demonstrating strong theoretical disruptive power. To our knowledge, this is the first systematic framework translating black-box predictive models into experimentally testable theoretical challenges.
📝 Abstract
Machine learning algorithms can find predictive signals that researchers fail to notice; yet they are notoriously hard-to-interpret. How can we extract theoretical insights from these black boxes? History provides a clue. Facing a similar problem -- how to extract theoretical insights from their intuitions -- researchers often turned to ``anomalies:'' constructed examples that highlight flaws in an existing theory and spur the development of new ones. Canonical examples include the Allais paradox and the Kahneman-Tversky choice experiments for expected utility theory. We suggest anomalies can extract theoretical insights from black box predictive algorithms. We develop procedures to automatically generate anomalies for an existing theory when given a predictive algorithm. We cast anomaly generation as an adversarial game between a theory and a falsifier, the solutions to which are anomalies: instances where the black box algorithm predicts - were we to collect data - we would likely observe violations of the theory. As an illustration, we generate anomalies for expected utility theory using a large, publicly available dataset on real lottery choices. Based on an estimated neural network that predicts lottery choices, our procedures recover known anomalies and discover new ones for expected utility theory. In incentivized experiments, subjects violate expected utility theory on these algorithmically generated anomalies; moreover, the violation rates are similar to observed rates for the Allais paradox and Common ratio effect.