🤖 AI Summary
In strategic learning, decision-makers must balance predictive accuracy with fair incentives for agents to undertake desirable feature modifications. Existing models neglect external incentive effects and agent heterogeneity—such as divergent causal structures and heterogeneous manipulation costs—thereby failing to jointly ensure accuracy and fairness.
Method: We propose the first unified Stackelberg game framework that formally defines “effort-desirability fairness” and characterizes its trade-off with optimality. Our model integrates causal reasoning, constrained optimization, and peer-based learning mechanisms.
Contribution/Results: We theoretically derive an upper bound on the principal’s optimality loss under a given fairness tolerance. Empirically, we demonstrate explicit trade-offs between prediction accuracy and multiple fairness metrics. Our work establishes a novel analytical framework for strategic classification that simultaneously ensures causal interpretability and formal fairness guarantees.
📝 Abstract
Strategic learning studies how decision rules interact with agents who may strategically change their inputs/features to achieve better outcomes. In standard settings, models assume that the decision-maker's sole scope is to learn a classifier that maximizes an objective (e.g., accuracy) assuming that agents best respond. However, real decision-making systems' goals do not align exclusively with producing good predictions. They may consider the external effects of inducing certain incentives, which translates to the change of certain features being more desirable for the decision maker. Further, the principal may also need to incentivize desirable feature changes fairly across heterogeneous agents. How much does this constrained optimization (i.e., maximize the objective, but restrict agents' incentive disparity) cost the principal? We propose a unified model of principal-agent interaction that captures this trade-off under three additional components: (1) causal dependencies between features, such that changes in one feature affect others; (2) heterogeneous manipulation costs between agents; and (3) peer learning, through which agents infer the principal's rule. We provide theoretical guarantees on the principal's optimality loss constrained to a particular desirability fairness tolerance for multiple broad classes of fairness measures. Finally, through experiments on real datasets, we show the explicit tradeoff between maximizing accuracy and fairness in desirability effort.