🤖 AI Summary
This work addresses the limitations of traditional clustering methods, such as k-means, which often yield results that disproportionately disadvantage certain demographic groups. While existing fair clustering approaches typically optimize only a single fairness criterion and overlook the interplay between clustering cost and decision boundary geometry, this paper introduces UniFair—a unified framework that jointly optimizes both separation fairness (by pushing protected groups away from decision boundaries) and social fairness (by regularizing disparities in group-wise clustering costs to mitigate intra-cluster distortion inequality). The framework integrates gradient-based k-means with an autoencoder to enable end-to-end deep clustering. Experimental results demonstrate that UniFair significantly reduces both boundary-related and cost-related group disparities across tabular and image datasets, with only marginal degradation in overall clustering performance.
📝 Abstract
Clustering is increasingly used to support high-impact decisions, yet standard objectives such as $k$-means can produce clusterings that treat demographic groups unequally. Existing fair clustering methods typically optimize a single notion of fairness and often overlook how clustering costs interact with the geometry of the induced decision boundaries. We propose \textsc{UniFair}, a unified framework that jointly optimizes \emph{separation fairness} and \emph{social fairness}. Separation fairness encourages protected groups to lie farther from the induced decision boundaries, while social fairness reduces disparities in within-cluster distortion by penalizing group-wise clustering costs. We develop gradient-based optimization procedures for separation-fair and unified $k$-means objectives, and extend them to deep clustering by enforcing the same criteria in the latent space of an autoencoder. Experiments on tabular and image datasets show that \textsc{UniFair} reduces both boundary-related and cost-based group disparities with only a modest increase in clustering loss.