Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the challenge of degraded predictive performance on rare yet critical samples in regression tasks due to imbalanced target value distributions. To this end, the authors propose a unified hybrid balancing framework that, for the first time, jointly integrates data-level and algorithm-level strategies. The framework comprises five key components: adaptive target binning, conditional variational autoencoder-based representation learning, multi-stage oversampling, latent-space density-weighted loss (LDWL), and attention-gated fusion, rendering it compatible with any base regressor. By dynamically partitioning the target space and incorporating a density-aware loss function, the method effectively enhances model sensitivity to tail-distributed samples. Extensive experiments demonstrate that the proposed approach significantly outperforms existing imbalance-aware regression methods and standard regressors across multiple benchmark datasets.
📝 Abstract
Imbalanced learning is a critical challenge in machine learning, where underrepresented target values can bias models and degrade prediction performance on rare but important cases. Although extensively studied in classification, imbalanced regression remains relatively underexplored. Existing methods mainly focus on either data-level balancing, which may introduce noise and overfitting, or algorithm-level balancing, which often struggles with highly complex target distributions. To address these limitations, we propose a unified hybrid framework that integrates both data- and algorithm-level balancing strategies into a regressor-agnostic pipeline. The proposed framework consists of five stages: (1) adaptive bin partitioning to dynamically segment the target space based on local linear coherence; (2) target-conditioned representation learning using a Conditional Variational Autoencoder; (3) multistage data-level balancing through feature-space clustering and oversampling of minority clusters; (4) algorithm-level balancing using a novel Latent-Density Weighted Loss (LDWL) to emphasize rare samples in latent and target spaces; and (5) attention-based gated fusion for final regression. Experimental results on benchmark datasets demonstrate that the proposed framework consistently improves predictive performance compared to standalone regressors and existing imbalanced regression approaches.
Problem

Research questions and friction points this paper is trying to address.

imbalanced regression
data-level balancing
algorithm-level balancing
target distribution
rare samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

imbalanced regression
hybrid balancing
Conditional Variational Autoencoder
Latent-Density Weighted Loss
attention-based fusion