Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This paper addresses key challenges in distance-based supervised learning—namely, the difficulty of modeling nonlinear feature interactions via subset-level weighting, susceptibility to redundant or highly correlated features, and prohibitive computational complexity. To this end, we propose a Choquet integral-based distance metric with feature subset weighting. Our core contributions are threefold: (1) the first integration of monotone measures with the Choquet integral to enable scalable, subset-level weight learning; (2) the construction of symmetric Choquet distance and similarity measures, unifying their duality; and (3) a reduction of weight learning complexity from *O*(2<sup>*m*</sup>) to *O*(*m*). Evaluated within the *k*-nearest neighbors classification framework, our method significantly outperforms both weighted Euclidean and Mahalanobis distances—particularly on high-redundancy datasets—demonstrating superior stability and classification accuracy.

Technology Category

Application Category

📝 Abstract

This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.

Problem

Research questions and friction points this paper is trying to address.

Weighting feature subsets for distance-based supervised learning

Using Choquet integral to capture non-linear feature interactions

Reducing computational complexity in feature subset weighting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature subset weighting via Choquet integral

Non-linear distance metric with attribute interactions

Efficient computation with m instead of 2^m weights

🔎 Similar Papers

No similar papers found.