🤖 AI Summary
This study addresses the challenge of detecting context-specific, local associations between explanatory and outcome variables under sample heterogeneity—where relationships vary with covariates. We propose the “local conditional hypothesis” framework, which adaptively identifies such associations while rigorously controlling the false discovery rate (FDR). Methodologically, we pioneer the extension of model-X knockoffs to the local hypothesis setting, enabling adaptive, unbiased hypothesis generation and testing without sample splitting. The framework accommodates arbitrary machine learning models as association detectors and integrates data-driven local hypothesis construction with enhanced FDR control. Applied to genetic analysis of waist-to-hip ratio in the UK Biobank, it successfully uncovered sex-specific genetic effects. Numerical experiments demonstrate high statistical power alongside low false discovery rates, balancing interpretability with statistical rigor.
📝 Abstract
We introduce local conditional hypotheses that express how the relation between explanatory variables and outcomes changes across different contexts, described by covariates. By expanding upon the model-X knockoff filter, we show how to adaptively discover these local associations, all while controlling the false discovery rate. Our enhanced inferences can help explain sample heterogeneity and uncover interactions, making better use of the capabilities offered by modern machine learning models. Specifically, our method is able to leverage any model for the identification of data-driven hypotheses pertaining to different contexts. Then, it rigorously test these hypotheses without succumbing to selection bias. Importantly, our approach is efficient and does not require sample splitting. We demonstrate the effectiveness of our method through numerical experiments and by studying the genetic architecture of Waist-Hip-Ratio across different sexes in the UKBiobank.