Information-Theoretic Fairness with A Bounded Statistical Parity Constraint

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates information-theoretic modeling of fair representation learning: maximizing task relevance $I(Y;T)$ subject to statistical parity ($I(Y;S)leqepsilon$) and compression ($I(Y;X)leq r$) constraints. We propose a novel analytical framework based on an extended Strong Functional Representation Lemma, enabling tight upper and lower bounds on achievable utility. Crucially, we establish—for the first time—that permitting controlled information leakage ($epsilon>0$) strictly improves representation utility, thereby relaxing the restrictive perfect fairness assumption. By incorporating randomized representation mechanisms and refined mutual information bounds, we explicitly characterize the privacy–utility trade-off. Our theoretical analysis yields both an achievable upper bound and a constructive lower bound for fair representations. Empirical evaluation across multiple benchmark datasets demonstrates that, under identical sensitive-information leakage levels, our method improves task accuracy by 3.2–7.8 percentage points over state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

In this paper, we study an information-theoretic problem of designing a fair representation that attains bounded statistical (demographic) parity. More specifically, an agent uses some useful data $X$ to solve a task $T$. Since both $X$ and $T$ are correlated with some sensitive attribute or secret $S$, the agent designs a representation $Y$ that satisfies a bounded statistical parity and/or privacy leakage constraint, that is, such that $I(Y;S) leq ε$. Here, we relax the perfect demographic (statistical) parity and consider a bounded-parity constraint. In this work, we design the representation $Y$ that maximizes the mutual information $I(Y;T)$ about the task while satisfying a bounded compression (or encoding rate) constraint, that is, ensuring that $I(Y;X) leq r$. Simultaneously, $Y$ satisfies the bounded statistical parity constraint $I(Y;S) leq ε$. To design $Y$, we use extended versions of the Functional Representation Lemma and the Strong Functional Representation Lemma which are based on randomization techniques and study the tightness of the obtained bounds in special cases. The main idea to derive the lower bounds is to use randomization over useful data $X$ or sensitive data $S$. Considering perfect demographic parity, i.e., $ε=0$, we improve the existing results (lower bounds) by using a tighter version of the Strong Functional Representation Lemma and propose new upper bounds. We then propose upper and lower bounds for the main problem and show that allowing non-zero leakage can improve the attained utility. Finally, we study the bounds and compare them in a numerical example. The problem studied in this paper can also be interpreted as one of code design with bounded leakage and bounded rate privacy considering the sensitive attribute as a secret.

Problem

Research questions and friction points this paper is trying to address.

Designing fair data representations with bounded statistical parity constraints

Maximizing task information under compression and privacy leakage limits

Balancing utility and fairness using information-theoretic randomization techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bounded statistical parity constraint via information theory

Randomization techniques using Functional Representation Lemmas

Maximizing task mutual information under compression constraints

🔎 Similar Papers

No similar papers found.

Authors to Follow