🤖 AI Summary
Medical tabular data frequently suffer from missing values, class imbalance, feature heterogeneity, and high dimensionality with limited samples—challenges that severely hinder model performance. To address these issues, we propose an automated machine learning (AutoML) framework that jointly optimizes preprocessing and model selection in an end-to-end manner. Specifically, it employs Latin Hypercube Sampling (LHS) for efficient exploration of the preprocessing strategy space and integrates Partial Rank Correlation Coefficient (PRCC) analysis to identify the most influential hyperparameters. This approach minimizes manual intervention while significantly enhancing generalization on sparse and imbalanced clinical datasets. Evaluated on two real-world clinical prediction tasks, our framework consistently outperforms state-of-the-art AutoML tools, achieving substantial improvements in balanced accuracy and sensitivity—particularly in high-risk patient identification.
📝 Abstract
Medical datasets are typically affected by issues such as missing values, class imbalance, a heterogeneous feature types, and a high number of features versus a relatively small number of samples, preventing machine learning models from obtaining proper results in classification and regression tasks. This paper introduces AutoML-Med, an Automated Machine Learning tool specifically designed to address these challenges, minimizing user intervention and identifying the optimal combination of preprocessing techniques and predictive models. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS) for exploring preprocessing methods, trains models using selected metrics, and utilizes Partial Rank Correlation Coefficient (PRCC) for fine-tuned optimization of the most influential preprocessing steps. Experimental results demonstrate AutoML-Med's effectiveness in two different clinical settings, achieving higher balanced accuracy and sensitivity, which are crucial for identifying at-risk patients, compared to other state-of-the-art tools. AutoML-Med's ability to improve prediction results, especially in medical datasets with sparse data and class imbalance, highlights its potential to streamline Machine Learning applications in healthcare.