🤖 AI Summary
This work investigates information-theoretic modeling of fair representation learning: maximizing task relevance $I(Y;T)$ subject to statistical parity ($I(Y;S)leqepsilon$) and compression ($I(Y;X)leq r$) constraints. We propose a novel analytical framework based on an extended Strong Functional Representation Lemma, enabling tight upper and lower bounds on achievable utility. Crucially, we establish—for the first time—that permitting controlled information leakage ($epsilon>0$) strictly improves representation utility, thereby relaxing the restrictive perfect fairness assumption. By incorporating randomized representation mechanisms and refined mutual information bounds, we explicitly characterize the privacy–utility trade-off. Our theoretical analysis yields both an achievable upper bound and a constructive lower bound for fair representations. Empirical evaluation across multiple benchmark datasets demonstrates that, under identical sensitive-information leakage levels, our method improves task accuracy by 3.2–7.8 percentage points over state-of-the-art baselines.
📝 Abstract
In this paper, we study an information-theoretic problem of designing a fair representation that attains bounded statistical (demographic) parity. More specifically, an agent uses some useful data $X$ to solve a task $T$. Since both $X$ and $T$ are correlated with some sensitive attribute or secret $S$, the agent designs a representation $Y$ that satisfies a bounded statistical parity and/or privacy leakage constraint, that is, such that $I(Y;S) leq ε$. Here, we relax the perfect demographic (statistical) parity and consider a bounded-parity constraint. In this work, we design the representation $Y$ that maximizes the mutual information $I(Y;T)$ about the task while satisfying a bounded compression (or encoding rate) constraint, that is, ensuring that $I(Y;X) leq r$. Simultaneously, $Y$ satisfies the bounded statistical parity constraint $I(Y;S) leq ε$. To design $Y$, we use extended versions of the Functional Representation Lemma and the Strong Functional Representation Lemma which are based on randomization techniques and study the tightness of the obtained bounds in special cases. The main idea to derive the lower bounds is to use randomization over useful data $X$ or sensitive data $S$. Considering perfect demographic parity, i.e., $ε=0$, we improve the existing results (lower bounds) by using a tighter version of the Strong Functional Representation Lemma and propose new upper bounds. We then propose upper and lower bounds for the main problem and show that allowing non-zero leakage can improve the attained utility. Finally, we study the bounds and compare them in a numerical example. The problem studied in this paper can also be interpreted as one of code design with bounded leakage and bounded rate privacy considering the sensitive attribute as a secret.