🤖 AI Summary
Addressing the challenge of modeling fairness with continuous protected attributes (e.g., age, income) in machine learning, this paper proposes a generalized nullspace projection framework—the first to extend nullspace projection to reproducing kernel Hilbert spaces (RKHS), enabling continuous fairness constraints in regression tasks. Leveraging kernel embeddings, the framework non-linearly models protected attributes while remaining model-agnostic and fairness-measure-agnostic. Integrated with support vector regression (SVR), it demonstrates strong empirical performance across multiple benchmark datasets: maintaining predictive accuracy while significantly improving continuous fairness metrics—such as fairness mean squared error—outperforming or matching state-of-the-art discretization- and regularization-based baselines. The core contribution is a scalable, theoretically grounded, and plug-and-play kernel-based solution for continuous fairness, bridging a critical gap in algorithmic fairness research.
📝 Abstract
With the on-going integration of machine learning systems into the everyday social life of millions the notion of fairness becomes an ever increasing priority in their development. Fairness notions commonly rely on protected attributes to assess potential biases. Here, the majority of literature focuses on discrete setups regarding both target and protected attributes. The literature on continuous attributes especially in conjunction with regression -- we refer to this as emph{continuous fairness} -- is scarce. A common strategy is iterative null-space projection which as of now has only been explored for linear models or embeddings such as obtained by a non-linear encoder. We improve on this by generalizing to kernel methods, significantly extending the scope. This yields a model and fairness-score agnostic method for kernel embeddings applicable to continuous protected attributes. We demonstrate that our novel approach in conjunction with Support Vector Regression (SVR) provides competitive or improved performance across multiple datasets in comparisons to other contemporary methods.