Local distribution-based adaptive oversampling for imbalanced regression

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses regression imbalance arising from skewed continuous target variable distributions—characterized by low prediction accuracy in sparse regions and compromised distribution continuity due to threshold-dependent oversampling methods that discard information. To this end, we propose Local Distribution-Adaptive Oversampling (LDAO), a novel framework that eliminates manual rare/frequent threshold specification. LDAO preserves structural locality via local distribution decomposition, estimates density using kernel density estimation with adaptive bandwidth selection, and performs localized resampling followed by weighted merging—yielding a statistically consistent and globally covering balanced training set. As the first regression balancing paradigm grounded in local distribution modeling, LDAO achieves state-of-the-art performance across 45 imbalanced regression benchmarks. It simultaneously improves prediction accuracy in both rare and frequent target regions, reducing average MAE by 12.7%.

Technology Category

Application Category

📝 Abstract
Imbalanced regression occurs when continuous target variables have skewed distributions, creating sparse regions that are difficult for machine learning models to predict accurately. This issue particularly affects neural networks, which often struggle with imbalanced data. While class imbalance in classification has been extensively studied, imbalanced regression remains relatively unexplored, with few effective solutions. Existing approaches often rely on arbitrary thresholds to categorize samples as rare or frequent, ignoring the continuous nature of target distributions. These methods can produce synthetic samples that fail to improve model performance and may discard valuable information through undersampling. To address these limitations, we propose LDAO (Local Distribution-based Adaptive Oversampling), a novel data-level approach that avoids categorizing individual samples as rare or frequent. Instead, LDAO learns the global distribution structure by decomposing the dataset into a mixture of local distributions, each preserving its statistical characteristics. LDAO then models and samples from each local distribution independently before merging them into a balanced training set. LDAO achieves a balanced representation across the entire target range while preserving the inherent statistical structure within each local distribution. In extensive evaluations on 45 imbalanced datasets, LDAO outperforms state-of-the-art oversampling methods on both frequent and rare target values, demonstrating its effectiveness for addressing the challenge of imbalanced regression.
Problem

Research questions and friction points this paper is trying to address.

Addresses imbalanced regression with skewed continuous target variables
Proposes adaptive oversampling without arbitrary rare/frequent thresholds
Improves model performance across entire target value range
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local distribution-based adaptive oversampling (LDAO)
Decomposes dataset into local distributions
Models and samples from each distribution independently
🔎 Similar Papers
No similar papers found.