AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical tabular data frequently suffer from missing values, class imbalance, feature heterogeneity, and high dimensionality with limited samples—challenges that severely hinder model performance. To address these issues, we propose an automated machine learning (AutoML) framework that jointly optimizes preprocessing and model selection in an end-to-end manner. Specifically, it employs Latin Hypercube Sampling (LHS) for efficient exploration of the preprocessing strategy space and integrates Partial Rank Correlation Coefficient (PRCC) analysis to identify the most influential hyperparameters. This approach minimizes manual intervention while significantly enhancing generalization on sparse and imbalanced clinical datasets. Evaluated on two real-world clinical prediction tasks, our framework consistently outperforms state-of-the-art AutoML tools, achieving substantial improvements in balanced accuracy and sensitivity—particularly in high-risk patient identification.

Technology Category

Application Category

📝 Abstract
Medical datasets are typically affected by issues such as missing values, class imbalance, a heterogeneous feature types, and a high number of features versus a relatively small number of samples, preventing machine learning models from obtaining proper results in classification and regression tasks. This paper introduces AutoML-Med, an Automated Machine Learning tool specifically designed to address these challenges, minimizing user intervention and identifying the optimal combination of preprocessing techniques and predictive models. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS) for exploring preprocessing methods, trains models using selected metrics, and utilizes Partial Rank Correlation Coefficient (PRCC) for fine-tuned optimization of the most influential preprocessing steps. Experimental results demonstrate AutoML-Med's effectiveness in two different clinical settings, achieving higher balanced accuracy and sensitivity, which are crucial for identifying at-risk patients, compared to other state-of-the-art tools. AutoML-Med's ability to improve prediction results, especially in medical datasets with sparse data and class imbalance, highlights its potential to streamline Machine Learning applications in healthcare.
Problem

Research questions and friction points this paper is trying to address.

Addresses missing values and class imbalance in medical data
Optimizes preprocessing and predictive models automatically
Improves accuracy for at-risk patient identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated preprocessing and model optimization
Latin Hypercube Sampling for method exploration
Partial Rank Correlation for fine-tuning steps
🔎 Similar Papers
No similar papers found.
R
Riccardo Francia
DISIT, Computer Science Institute, University of Piemonte Orientale, Alessandria, Italy
M
Maurizio Leone
Fondazione IRCCS, Casa Sollievo della Sofferenza, San Giovanni Rotondo, Foggia, Italy
Giorgio Leonardi
Giorgio Leonardi
DISIT, Computer Science Institute, University of Piemonte Orientale, Alessandria, Italy
Stefania Montani
Stefania Montani
Full Professor, University of Piemonte Orientale
Artificial IntelligenceCase Based ReasoningBayesian NetworksTemporal Databases
M
Marzio Pennisi
DISIT, Computer Science Institute, University of Piemonte Orientale, Alessandria, Italy
M
Manuel Striani
DISIT, Computer Science Institute, University of Piemonte Orientale, Alessandria, Italy
S
Sandra D’Alfonso
DISS, University of Piemonte Orientale, Novara, Italy