Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment

📅 2025-01-01
🏛️ Social Science Research Network
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses prediction bias arising from inconsistent covariate classification—e.g., race/ethnicity—between the evidence-generation (study) and decision-making (deployment) stages in evidence translation. We formally introduce the “Differentially Classified Covariates” (DCC) framework, the first of its kind. Leveraging causal inference and partial identification theory, we develop a nonparametric framework for deriving prediction bounds on the conditional probability $P(y mid x)$, characterizing its identifiability limits and conditions under which bounds shrink. Our analysis shows that DCC universally widens prediction intervals; effective tightening requires strong assumptions—such as known classification mechanisms or observable proxy variables. Empirically, we demonstrate that racial misclassification in clinical risk prediction can double predictive uncertainty, exposing a previously overlooked risk of evidentiary failure in public policy and healthcare practice.

Technology Category

Application Category

📝 Abstract
A common practice in evidence-based decision-making uses estimates of conditional probabilities P(y|x) obtained from research studies to predict outcomes y on the basis of observed covariates x. Given this information, decisions are then based on the predicted outcomes. Researchers commonly assume that the predictors used in the generation of the evidence are the same as those used in applying the evidence: i.e., the meaning of x in the two circumstances is the same. This may not be the case in real-world settings. Across a wide-range of settings, ranging from clinical practice or education policy, demographic attributes (e.g., age, race, ethnicity) are often classified differently in research studies than in decision settings. This paper studies identification in such settings. We propose a formal framework for prediction with what we term differential covariate classification (DCC). Using this framework, we analyze partial identification of probabilistic predictions and assess how various assumptions influence the identification regions. We apply the findings to a range of settings, focusing mainly on differential classification of individuals' race and ethnicity in clinical medicine. We find that bounds on P(y|x) can be wide, and the information needed to narrow them available only in special cases. These findings highlight an important problem in using evidence in decision making, a problem that has not yet been fully appreciated in debates on classification in public policy and medicine.
Problem

Research questions and friction points this paper is trying to address.

Addresses mismatched covariate definitions between research and application
Develops framework for prediction under differential covariate classification
Analyzes identification bounds for probabilistic predictions in real-world settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes differential covariate classification (DCC) framework
Analyzes partial identification of probabilistic predictions
Applies framework to differential race classification in medicine
🔎 Similar Papers
No similar papers found.
A
Atheendar S. Venkataramani
University of Pennsylvania
Charles F. Manski
Charles F. Manski
Northwestern University
econometrics and statisticsjudgment and decisionpublic policy
J
John Mullahy
University of Wisconsin- Madison