KGroups: A Versatile Univariate Max-Relevance Min-Redundancy Feature Selection Algorithm for High-dimensional Biological Data

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing univariate feature selection methods, which often struggle to balance maximizing relevance and minimizing redundancy while relying on inefficient ranking or incremental search strategies. To overcome these challenges, the authors propose KGroups, a novel approach that, for the first time, integrates a clustering mechanism into the univariate maximum relevance minimum redundancy (mRMR) framework. By introducing a tunable hyperparameter, KGroups achieves both high efficiency and strong predictive performance. Experimental results on 14 high-dimensional biological datasets demonstrate that KGroups matches the prediction accuracy of multivariate mRMR methods while achieving speedups of up to 821×, substantially outperforming conventional baselines such as SelectKBest.
📝 Abstract
This paper proposes a new univariate filter feature selection (FFS) algorithm called KGroups. The majority of work in the literature focuses on investigating the relevance or redundancy estimations of feature selection (FS) methods. This has shown promising results and a real improvement of FFS methods' predictive performance. However, limited efforts have been made to investigate alternative FFS algorithms. This raises the following question: how much of the FFS methods' predictive performance depends on the selection algorithm rather than the relevance or the redundancy estimations? The majority of FFS methods fall into two categories: relevance maximisation (Max-Rel, also known as KBest) or simultaneous relevance maximisation and redundancy minimisation (mRMR). KBest is a univariate FFS algorithm that employs sorting (descending) for selection. mRMR is a multivariate FFS algorithm that employs an incremental search algorithm for selection. In this paper, we propose a new univariate mRMR called KGroups that employs clustering for selection. Extensive experiments on 14 high-dimensional biological benchmark datasets showed that KGroups achieves similar predictive performance compared to multivariate mRMR while being up to 821 times faster. KGroups is parameterisable, which leaves room for further predictive performance improvement through hyperparameter finetuning, unlike mRMR and KBest. KGroups outperforms KBest.
Problem

Research questions and friction points this paper is trying to address.

feature selection
univariate filter
mRMR
high-dimensional biological data
selection algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

KGroups
univariate mRMR
clustering-based selection
feature selection
high-dimensional biological data
🔎 Similar Papers
No similar papers found.