Fair Bayesian Data Selection via Generalized Discrepancy Measures

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fairness interventions for machine learning models in high-stakes settings predominantly operate at the model level, suffering from high computational overhead and poor generalization. This paper proposes a Bayesian data selection framework that, for the first time, incorporates generalized distributional discrepancy measures—such as Wasserstein distance, maximum mean discrepancy (MMD), and f-divergences—into Bayesian data weighting. By aligning the parameter-weight posterior distributions across demographic groups to a shared central distribution, the method mitigates bias at the data source without imposing explicit fairness constraints. It is theoretically grounded and geometrically aware. Extensive experiments on multiple benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art data-level and model-level fairness methods in both fairness metrics—including equal opportunity difference and statistical parity—and predictive accuracy.

Technology Category

Application Category

📝 Abstract
Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs, limited scalability, and poor generalization. To address these challenges, we propose a Bayesian data selection framework that ensures fairness by aligning group-specific posterior distributions of model parameters and sample weights with a shared central distribution. Our framework supports flexible alignment via various distributional discrepancy measures, including Wasserstein distance, maximum mean discrepancy, and $f$-divergence, allowing geometry-aware control without imposing explicit fairness constraints. This data-centric approach mitigates group-specific biases in training data and improves fairness in downstream tasks, with theoretical guarantees. Experiments on benchmark datasets show that our method consistently outperforms existing data selection and model-based fairness methods in both fairness and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses fairness concerns in machine learning model deployment
Mitigates group-specific biases in training data selection
Ensures fairness without explicit constraints using Bayesian framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian data selection framework ensures fairness
Aligns group-specific posteriors with central distribution
Uses flexible discrepancy measures for geometry-aware control
🔎 Similar Papers
No similar papers found.
Y
Yixuan Zhang
School of Statistics and Data Science, Southeast University, China
J
Jiabin Luo
School of Software and Microelectronics, Peking University, China
Z
Zhenggang Wang
School of Statistics and Data Science, Southeast University, China
F
Feng Zhou
Center for Applied Statistics and School of Statistics, Renmin University of China, China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, China
Quyu Kong
Quyu Kong
Alibaba Cloud
Multimodal LLMInformation Diffusion ModelingMachine Learning