Is Fairness Truly Fair? Towards Reliable Lipschitz Fairness in Multi-Task Learning via Fixed-\texorpdfstring{$δ$}{delta} Alignment

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses a critical yet overlooked issue in multi-task learning: the distortion of Lipschitz individual fairness assessments caused by inconsistent semantic interpretations of fairness thresholds due to varying model representation scales. To resolve this “threshold confusion,” the authors propose ReLiF, a novel framework that introduces a unified auditing protocol with a fixed δ to ensure semantic consistency in fairness evaluation. ReLiF decouples training from evaluation by integrating a violation-rate feedback controller and a Huber-smoothed surrogate loss, thereby preserving the validity of Lipschitz constraints without allowing them to dominate the optimization process. Experiments on NYUv2 and clinical time-series datasets demonstrate that ReLiF substantially reduces alignment bias and reliably uncovers hidden utility–fairness trade-offs, validating both the necessity and efficacy of the proposed auditing protocol.

📝 Abstract

Lipschitz-style individual fairness formalizes the idea that semantically similar examples should receive similar predictions, but its evaluation in multi-task learning (MTL) can be confounded by method-induced representation scales. This paper identifies threshold confounding: when the auditing tolerance is derived from each model's own representation distances, different algorithms are compared under different semantic thresholds. A threshold-drift analysis further shows how Bias rankings can change and identifies sufficient conditions for ranking preservation. We propose \textbf{ReLiF}, a reliability-aware framework that separates evaluation-time fixed-$δ$ auditing from training-time controlled regularization. ReLiF uses a shared reference tolerance for comparable auditing and a violation-rate feedback controller to keep the Lipschitz surrogate active without letting it dominate stochastic training. This work also develops supporting analysis for threshold drift, reference-tolerance selection, and the relationship between the huberized training surrogate and its unsmoothed positive-margin counterpart. Experiments on clinical time-series benchmarks and NYUv2 (NYU Depth V2) dense prediction show that fixed-$δ$ auditing exposes utility--fairness trade-offs that method-dependent thresholds can obscure. On NYUv2 with a ResNet50 backbone, ReLiF achieves competitive utility while substantially reducing aligned bias under shared fixed thresholds. On clinical benchmarks, ReLiF yields controlled fairness-regularized trade-offs, while fixed-$δ$ auditing reveals that task-balancing baselines can sometimes achieve lower bias and that genuine utility--fairness trade-offs persist. These results support fixed-$δ$ auditing as a semantically consistent protocol for evaluating Lipschitz fairness in MTL.

Problem

Research questions and friction points this paper is trying to address.

Lipschitz fairness

multi-task learning

threshold confounding

individual fairness

fairness evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lipschitz fairness

multi-task learning

fixed-delta auditing