A Computational Approach to Improving Fairness in K-means Clustering

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
K-means clustering often compromises group fairness due to imbalanced distributions of sensitive attributes (e.g., gender, race). To address this, we propose a two-stage fair optimization framework: first perform standard K-means, then selectively reassign only a small number of critical samples to improve fairness. We introduce a novel “fairness cost” metric and design two efficient, high-impact sample selection strategies—neighborhood boundary point identification and high-mixing-degree point selection—to jointly enhance fairness while preserving clustering quality. Our method integrates subgroup distribution constraints, local neighborhood search, and quantitative mixing-degree analysis, ensuring extensibility in both algorithmic design and fairness metric adoption. Experiments on multiple benchmark datasets demonstrate substantial improvements in fairness metrics (e.g., balance, statistical parity), with less than 1% degradation in the clustering objective function value—validating its effectiveness and practicality.

Technology Category

Application Category

📝 Abstract
The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some sensitive variable, e.g., gender or race. Such a fairness issue may cause bias and unexpected social consequences. This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation--clustering first and then adjust cluster membership of a small subset of selected data points. Two computationally efficient algorithms are proposed in identifying those data points that are expensive for fairness, with one focusing on nearest data points outside of a cluster and the other on highly 'mixed' data points. Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality. The proposed algorithms can be easily extended to a broad class of clustering algorithms or fairness metrics.
Problem

Research questions and friction points this paper is trying to address.

Addressing fairness issues in K-means clustering
Reducing bias from sensitive variables like gender or race
Optimizing cluster membership with minimal quality impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage optimization for fairness improvement
Efficient algorithms for expensive data points
Extensible to various clustering algorithms
🔎 Similar Papers
No similar papers found.
G
Guancheng Zhou
Mathematics and Data Science, University of Massachusetts Dartmouth, MA
Haiping Xu
Haiping Xu
Professor of Computer and Information Science, University of Massachusetts Dartmouth
Software EngineeringDistributed ComputingMobile Cloud Computing
H
Hongkang Xu
Charlton College of Business, University of Massachusetts Dartmouth, MA
C
Chenyu Li
Data Science, Columbia University
Donghui Yan
Donghui Yan
Unknown affiliation
Statisticsmachine learningdata mining