Searching for local associations while controlling the false discovery rate

📅 2024-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of detecting context-specific, local associations between explanatory and outcome variables under sample heterogeneity—where relationships vary with covariates. We propose the “local conditional hypothesis” framework, which adaptively identifies such associations while rigorously controlling the false discovery rate (FDR). Methodologically, we pioneer the extension of model-X knockoffs to the local hypothesis setting, enabling adaptive, unbiased hypothesis generation and testing without sample splitting. The framework accommodates arbitrary machine learning models as association detectors and integrates data-driven local hypothesis construction with enhanced FDR control. Applied to genetic analysis of waist-to-hip ratio in the UK Biobank, it successfully uncovered sex-specific genetic effects. Numerical experiments demonstrate high statistical power alongside low false discovery rates, balancing interpretability with statistical rigor.

Technology Category

Application Category

📝 Abstract
We introduce local conditional hypotheses that express how the relation between explanatory variables and outcomes changes across different contexts, described by covariates. By expanding upon the model-X knockoff filter, we show how to adaptively discover these local associations, all while controlling the false discovery rate. Our enhanced inferences can help explain sample heterogeneity and uncover interactions, making better use of the capabilities offered by modern machine learning models. Specifically, our method is able to leverage any model for the identification of data-driven hypotheses pertaining to different contexts. Then, it rigorously test these hypotheses without succumbing to selection bias. Importantly, our approach is efficient and does not require sample splitting. We demonstrate the effectiveness of our method through numerical experiments and by studying the genetic architecture of Waist-Hip-Ratio across different sexes in the UKBiobank.
Problem

Research questions and friction points this paper is trying to address.

Detect local associations between variables and outcomes across contexts
Control false discovery rate in adaptive hypothesis testing
Leverage machine learning to identify and test data-driven hypotheses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages model-X knockoff filter enhancements
Identifies data-driven hypotheses adaptively
Controls false discovery rate rigorously
🔎 Similar Papers
No similar papers found.
P
Paula Gablenz
Department of Statistics, Stanford University, California, USA
M
M. Sesia
Departments of Data Sciences and Operations, and of Computer Science, University of Southern California, California, USA
Tianshu Sun
Tianshu Sun
Dean's Distinguished Chair Professor of Information Systems, Cheung Kong Graduate School of Business
Digital PlatformAnalytics & AI & ExperimentationData Value & Privacy & RegulationSocial
C
C. Sabatti
Departments of Biomedical Data Science, and of Statistics, Stanford University, California, USA