SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the common assumption that unsupervised representation learning is inherently fair due to the absence of explicit sensitive attributes. We propose SOMtime, a method that constructs topology-preserving unsupervised embeddings using high-capacity self-organizing maps (SOMs), and demonstrate for the first time that such representations can inherently encode ordinal sensitive attributes—such as age and income—as dominant latent axes, thereby inducing significant demographic bias in downstream clustering. On the World Values Survey and Census-Income datasets, SOMtime exhibits Spearman correlations with sensitive attributes as high as 0.85, substantially outpacing PCA, UMAP, t-SNE, and autoencoders (all ≤ 0.34). These findings underscore the urgent need to incorporate fairness audits into unsupervised learning components.

Technology Category

Application Category

📝 Abstract
Unsupervised representations are widely assumed to be neutral with respect to sensitive attributes when those attributes are withheld from training. We show that this assumption is false. Using SOMtime, a topology-preserving representation method based on high-capacity Self-Organizing Maps, we demonstrate that sensitive attributes such as age and income emerge as dominant latent axes in purely unsupervised embeddings, even when explicitly excluded from the input. On two large-scale real-world datasets (the World Values Survey across five countries and the Census-Income dataset), SOMtime recovers monotonic orderings aligned with withheld sensitive attributes, achieving Spearman correlations of up to 0.85, whereas PCA and UMAP typically remain below 0.23 (with a single exception reaching 0.31), and against t-SNE and autoencoders which achieve at most 0.34. Furthermore, unsupervised segmentation of SOMtime embeddings produces demographically skewed clusters, demonstrating downstream fairness risks without any supervised task. These findings establish that \textit{fairness through unawareness} fails at the representation level for ordinal sensitive attributes and that fairness auditing must extend to unsupervised components of machine learning pipelines. We have made the code available at~ https://github.com/JosephBingham/SOMtime
Problem

Research questions and friction points this paper is trying to address.

fairness
unsupervised representation
sensitive attributes
self-organizing maps
representation bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Organizing Maps
unsupervised representation
algorithmic fairness
latent bias
fairness through unawareness
J
Joseph Bingham
Technion-Israel Institute of Technology, Haifa, Israel
N
Netanel Arussy
Technion-Israel Institute of Technology, Haifa, Israel
Dvir Aran
Dvir Aran
Assistant Professor @ Technion - Israel Institute of Technology
Computational BiologySingle Cell GenomicsCancer GenomicsClinical Informatics