🤖 AI Summary
This paper addresses key challenges in distance-based supervised learning—namely, the difficulty of modeling nonlinear feature interactions via subset-level weighting, susceptibility to redundant or highly correlated features, and prohibitive computational complexity. To this end, we propose a Choquet integral-based distance metric with feature subset weighting. Our core contributions are threefold: (1) the first integration of monotone measures with the Choquet integral to enable scalable, subset-level weight learning; (2) the construction of symmetric Choquet distance and similarity measures, unifying their duality; and (3) a reduction of weight learning complexity from *O*(2<sup>*m*</sup>) to *O*(*m*). Evaluated within the *k*-nearest neighbors classification framework, our method significantly outperforms both weighted Euclidean and Mahalanobis distances—particularly on high-redundancy datasets—demonstrating superior stability and classification accuracy.
📝 Abstract
This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.