Causally-Aware Unsupervised Feature Selection Learning

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing unsupervised feature selection methods overlook the intrinsic causal mechanisms underlying data, leading to confounding bias, feature redundancy, distorted similarity graphs, and limited interpretability. To address these issues for unlabeled high-dimensional data, this paper proposes a causality-driven unsupervised feature selection framework. First, a causal regularization-based reweighting strategy is introduced to correct sample distributions and mitigate confounding effects. Second, a causal-guided hierarchical clustering module is integrated with a multi-granularity adaptive similarity graph fusion mechanism to explicitly disentangle causal versus non-causal feature contributions to graph structure. Finally, causal feature importance modeling enables interpretable feature selection. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods. Visualization results further confirm the framework’s strong causal interpretability and faithful preservation of local data structures.

Technology Category

Application Category

📝 Abstract

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised Feature Selection

Causal Relationships

False Connections in Similarity Graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Regularizer

Hierarchical Clustering

Interpretable Unsupervised Feature Selection

🔎 Similar Papers

Knowledge Discovery using Unsupervised Cognition