Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge of degraded predictive performance on rare yet critical samples in regression tasks due to imbalanced target value distributions. To this end, the authors propose a unified hybrid balancing framework that, for the first time, jointly integrates data-level and algorithm-level strategies. The framework comprises five key components: adaptive target binning, conditional variational autoencoder-based representation learning, multi-stage oversampling, latent-space density-weighted loss (LDWL), and attention-gated fusion, rendering it compatible with any base regressor. By dynamically partitioning the target space and incorporating a density-aware loss function, the method effectively enhances model sensitivity to tail-distributed samples. Extensive experiments demonstrate that the proposed approach significantly outperforms existing imbalance-aware regression methods and standard regressors across multiple benchmark datasets.

📝 Abstract

Imbalanced learning is a critical challenge in machine learning, where underrepresented target values can bias models and degrade prediction performance on rare but important cases. Although extensively studied in classification, imbalanced regression remains relatively underexplored. Existing methods mainly focus on either data-level balancing, which may introduce noise and overfitting, or algorithm-level balancing, which often struggles with highly complex target distributions. To address these limitations, we propose a unified hybrid framework that integrates both data- and algorithm-level balancing strategies into a regressor-agnostic pipeline. The proposed framework consists of five stages: (1) adaptive bin partitioning to dynamically segment the target space based on local linear coherence; (2) target-conditioned representation learning using a Conditional Variational Autoencoder; (3) multistage data-level balancing through feature-space clustering and oversampling of minority clusters; (4) algorithm-level balancing using a novel Latent-Density Weighted Loss (LDWL) to emphasize rare samples in latent and target spaces; and (5) attention-based gated fusion for final regression. Experimental results on benchmark datasets demonstrate that the proposed framework consistently improves predictive performance compared to standalone regressors and existing imbalanced regression approaches.

Problem

Research questions and friction points this paper is trying to address.

imbalanced regression

data-level balancing

algorithm-level balancing

target distribution

rare samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

imbalanced regression

hybrid balancing

Conditional Variational Autoencoder