Measuring Database Unfairness via Dependency Quantification Under Differential Privacy

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of effectively evaluating dataset fairness and reliability under differential privacy constraints. It introduces, for the first time, a formal framework that integrates data unfairness metrics with differential privacy, grounded in three core principles: positivity, monotonicity, and computability. The framework encompasses three complementary fairness measures: privacy-preserving proxies based on mutual information and total variation distance, a weighted MaxSAT-driven approximation for data repair, and a top-k tuple contribution analysis. Accompanying privacy-preserving algorithms are developed for each measure. Experimental results demonstrate that the proposed approach accurately approximates non-private fairness metrics under strong privacy guarantees, effectively quantifies bias, and yields interpretable fairness insights, thereby offering practical tools for data management in privacy-sensitive settings.

📝 Abstract

Differential privacy (DP) has become the de facto standard for protecting sensitive data, providing strong guarantees that published statistics or models reveal limited information about any individual. However, privacy noise and restricted data access make it increasingly difficult to assess the fairness and reliability of private datasets. In this paper, we propose a formal framework for quantifying data unfairness under DP. We identify three core desiderata for unfairness measures based on previous work: positivity, monotonicity, and DP computability. We further instantiate them through three complementary measures: (1) a mutual information-based measure with a total variation distance proxy suitable for DP, (2) a data repair-based measure approximated via a reduction to weighted MaxSAT, and (3) a top-$k$ tuple contribution measure that isolates the most influential records in fairness violations. We design privacy-preserving algorithms and analyze their sensitivity, accuracy, and efficiency. Extensive experiments on multiple real-world datasets demonstrate that our proposed measures faithfully approximate their non-private counterparts, effectively quantify bias under privacy constraints, and provide insights for data management.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

data unfairness

fairness measurement

database bias

privacy-preserving

Innovation

Methods, ideas, or system contributions that make the work stand out.

differential privacy

fairness quantification

mutual information