🤖 AI Summary
In resource-constrained multiple testing, exhaustive evaluation of all hypotheses and computation of exact test statistics (e.g., via experiments or precise calculations) is infeasible.
Method: This paper proposes a surrogate-driven active testing framework that leverages auxiliary information—such as expert judgment, ML predictions, or historical data—to construct surrogate test statistics. It dynamically decides whether to invoke costly exact tests; otherwise, it substitutes the surrogate values directly.
Contribution/Results: The framework is the first to enable compatible p-value and e-value constructions under arbitrary dependence structures—without requiring independence between surrogates and true statistics—while provably controlling the false discovery rate (FDR). By unifying active learning, multiple testing theory, and e-value theory, it achieves both theoretical rigor and practical utility. Empirical evaluation on scCRISPR causal effect analysis demonstrates a 32% increase in discoveries and a 68% reduction in computational cost compared to exhaustive testing, under identical FDR constraints.
📝 Abstract
Researchers often lack the resources to test every hypothesis of interest directly or compute test statistics comprehensively, but often possess auxiliary data from which we can compute an estimate of the experimental outcome. We introduce a novel approach for selecting which hypotheses to query a statistic (i.e., run an experiment, perform expensive computation, etc.) in a hypothesis testing setup by leveraging estimates (e.g., from experts, machine learning models, previous experiments, etc.) to compute proxy statistics. Our framework allows a scientist to propose a proxy statistic, and then query the true statistic with some probability based on the value of the proxy. We make no assumptions about how the proxy is derived and it can be arbitrarily dependent with the true statistic. If the true statistic is not queried, the proxy is used in its place. We characterize"active"methods that produce valid p-values and e-values in this setting and utilize this framework in the multiple testing setting to create procedures with false discovery rate (FDR) control. Through simulations and real data analysis of causal effects in scCRISPR screen experiments, we empirically demonstrate that our proxy framework has both high power and low resource usage when our proxies are accurate estimates of the respective true statistics.