False Discovery Rate Control via Bayesian Mirror Statistic

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

294K/year

🤖 AI Summary

Addressing challenges in high-dimensional variable selection—including stringent false discovery rate (FDR) control and efficiency loss from data splitting—this paper introduces the first Bayesian mirror statistic framework. Instead of partitioning data, it directly constructs mirror statistics from the posterior distribution of regression coefficients within a unified Bayesian model. The approach accommodates both continuous and discrete responses, as well as complex models (e.g., mixed-effects models), leveraging automatic differentiation variational inference (ADVI) with continuous priors for scalable and accurate posterior approximation. Theoretically, it guarantees strict FDR control under mild regularity conditions. Empirically, it achieves superior statistical power and robust false positive control in high-dimensional settings. The core contribution is the development of a principled, flexible, and extensible Bayesian mirror statistic paradigm—unifying rigorous FDR control, modeling adaptability, and computational tractability.

Technology Category

Application Category

📝 Abstract

Simultaneously performing variable selection and inference in high-dimensional models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of specific statistical procedures to accurately select the most important predictors in a high-dimensional space, while being able to control some form of selection error. In this work we adapt the Mirror Statistic approach to False Discovery Rate (FDR) control into a Bayesian modelling framework. The Mirror Statistic, developed in the classic frequentist statistical framework, is a flexible method to control FDR, which only requires mild model assumptions, but requires two sets of independent regression coefficient estimates, usually obtained after splitting the original dataset. Here we propose to rely on a Bayesian formulation of the model and use the posterior distributions of the coefficients of interest to build the Mirror Statistic and effectively control the FDR without the need to split the data. Moreover, the method is very flexible since it can be used with continuous and discrete outcomes and more complex predictors, such as with mixed models. We keep the approach scalable to high-dimensions by relying on Automatic Differentiation Variational Inference and fully continuous prior choices.

Problem

Research questions and friction points this paper is trying to address.

Controls false discovery rate in high-dimensional variable selection

Adapts frequentist mirror statistic to Bayesian modeling framework

Eliminates data splitting requirement using posterior distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Mirror Statistic for FDR control

Uses posterior distributions without data splitting

Scalable via variational inference with continuous priors

🔎 Similar Papers

Introducing Perturb-ability Score (PS) to Enhance Robustness Against Problem-Space Evasion Adversarial Attacks on Flow-based ML-NIDS