🤖 AI Summary
This work proposes a novel framework for estimating and controlling the local false discovery rate (lfdr) under the assumption that the null distribution is symmetric about zero but otherwise unspecified in form. The method estimates the marginal density ratio \( f(-w)/f(w) \) via logistic regression using natural cubic spline bases, from which an lfdr estimator is constructed. Theoretical analysis establishes the consistency and favorable asymptotic properties of this estimator; notably, when it converges uniformly, thresholding at the nominal level asymptotically controls the lfdr in multiple testing. This study is the first to achieve asymptotic lfdr control under such minimal assumptions and rigorously elucidates the theoretical connection between estimation accuracy and multiple testing performance.
📝 Abstract
This paper is concerned with estimating the local false discovery rate (lfdr) in a two-groups model where the only assumption regarding the null distribution is symmetry about zero. Our motivation comes from the contemporary framework for multiple hypothesis testing, particularly relevant in variable selection problems, which transforms any user-specified scores into statistics whose null distributions are symmetric about zero, whereas enrichment to the right of zero is generally expected for the non-nulls. While modern methods such as the knockoff filter (Barber and Candes; 2015) are able to exploit the null property for controlling the false discovery rate (FDR), an arguably more appropriate goal is to target control of the local false discovery rate for the rejected hypotheses, as proposed in Soloff et al. (2024) where the standard two-groups model (known $f_0$ and independence) is analyzed. Here, we take a step in this direction and propose to estimate the lfdr by targeting the surrogate density ratio $f(-w)/f(w)$, for $w>0$, where $f$ is the marginal density in the aforementioned ``stripped-down'' two-groups model. We study several estimators and propose a logistic regression based method with natural cubic spline basis. We also show that any consistent estimator of this surrogate yields asymptotic lfdr control of the multiple testing procedure that thresholds the estimate at the nominal level.