AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Medical tabular data frequently suffer from missing values, class imbalance, feature heterogeneity, and high dimensionality with limited samples—challenges that severely hinder model performance. To address these issues, we propose an automated machine learning (AutoML) framework that jointly optimizes preprocessing and model selection in an end-to-end manner. Specifically, it employs Latin Hypercube Sampling (LHS) for efficient exploration of the preprocessing strategy space and integrates Partial Rank Correlation Coefficient (PRCC) analysis to identify the most influential hyperparameters. This approach minimizes manual intervention while significantly enhancing generalization on sparse and imbalanced clinical datasets. Evaluated on two real-world clinical prediction tasks, our framework consistently outperforms state-of-the-art AutoML tools, achieving substantial improvements in balanced accuracy and sensitivity—particularly in high-risk patient identification.

Technology Category

Application Category

📝 Abstract

Medical datasets are typically affected by issues such as missing values, class imbalance, a heterogeneous feature types, and a high number of features versus a relatively small number of samples, preventing machine learning models from obtaining proper results in classification and regression tasks. This paper introduces AutoML-Med, an Automated Machine Learning tool specifically designed to address these challenges, minimizing user intervention and identifying the optimal combination of preprocessing techniques and predictive models. AutoML-Med's architecture incorporates Latin Hypercube Sampling (LHS) for exploring preprocessing methods, trains models using selected metrics, and utilizes Partial Rank Correlation Coefficient (PRCC) for fine-tuned optimization of the most influential preprocessing steps. Experimental results demonstrate AutoML-Med's effectiveness in two different clinical settings, achieving higher balanced accuracy and sensitivity, which are crucial for identifying at-risk patients, compared to other state-of-the-art tools. AutoML-Med's ability to improve prediction results, especially in medical datasets with sparse data and class imbalance, highlights its potential to streamline Machine Learning applications in healthcare.

Problem

Research questions and friction points this paper is trying to address.

Addresses missing values and class imbalance in medical data

Optimizes preprocessing and predictive models automatically

Improves accuracy for at-risk patient identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated preprocessing and model optimization

Latin Hypercube Sampling for method exploration

Partial Rank Correlation for fine-tuning steps

🔎 Similar Papers

No similar papers found.