FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the substantial bias in standard evaluation metrics (e.g., accuracy, F1-score) and consequent unreliability in model selection under binary class imbalance, this paper proposes the Unbiased Imbalance-aware Criterion (UIC) and the Imbalance-Penalized Integrated Prediction (IPIP) ensemble method, unified within the FILM framework. UIC introduces imbalance-sensitive penalization and multi-metric weighted aggregation—yielding theoretically grounded, statistically significant reduction in minority-class bias (p < 10⁻⁴). IPIP mitigates distribution shift via consistent data partitioning and ensemble integration of base learners (random forests and logistic regression). Empirical evaluation across seven real-world imbalanced datasets demonstrates that IPIP achieves significantly higher UIC scores than state-of-the-art imbalance learning methods on three datasets. The FILM framework is publicly released as an open-source R package.

Technology Category

Application Category

📝 Abstract

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias ($p<10^{-4}$) towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.

Problem

Research questions and friction points this paper is trying to address.

Addresses bias in evaluation metrics for imbalanced datasets.

Proposes Unbiased Integration Coefficients (UIC) for consistent model evaluation.

Introduces IPIP algorithm to improve classification in imbalanced data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbiased Integration Coefficients (UIC) metric introduced

Identical Partitions for Imbalance Problems (IPIP) algorithm developed

FILM R Package for imbalanced datasets created

🔎 Similar Papers

Sample Selection Bias in Machine Learning for Healthcare