More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate predictions for minority labels in Deep Imbalanced Regression (DIR), this paper introduces, for the first time, a conditional diffusion model for latent-space data augmentation. Our method employs a priority-based feature generation mechanism: in the latent representation space, it dynamically samples and synthesizes high-fidelity, semantically consistent features for minority-label instances based on label rarity. This approach fills a critical gap in data-level solutions for high-dimensional, unstructured data without modifying the backbone architecture. Evaluated on three standard DIR benchmarks, it reduces minority-region MAE by up to 18.7% while maintaining overall performance stability. Key contributions include: (1) the first conditional diffusion framework tailored for DIR in latent space; (2) a label-aware priority generation strategy that explicitly models label frequency; and (3) an efficient, cross-modal data completion paradigm that generalizes across vision and tabular modalities.

Technology Category

Application Category

📝 Abstract
In many real-world regression tasks, the data distribution is heavily skewed, and models learn predominantly from abundant majority samples while failing to predict minority labels accurately. While imbalanced classification has been extensively studied, imbalanced regression remains relatively unexplored. Deep imbalanced regression (DIR) represents cases where the input data are high-dimensional and unstructured. Although several data-level approaches for tabular imbalanced regression exist, deep imbalanced regression currently lacks dedicated data-level solutions suitable for high-dimensional data and relies primarily on algorithmic modifications. To fill this gap, we propose LatentDiff, a novel framework that uses conditional diffusion models with priority-based generation to synthesize high-quality features in the latent representation space. LatentDiff is computationally efficient and applicable across diverse data modalities, including images, text, and other high-dimensional inputs. Experiments on three DIR benchmarks demonstrate substantial improvements in minority regions while maintaining overall accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses deep imbalanced regression with skewed data distributions
Proposes latent diffusion models to augment minority class samples
Enhances prediction accuracy for underrepresented labels in high-dimensional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conditional diffusion models for data synthesis
Generates features in latent representation space
Applies priority-based generation for minority regions
🔎 Similar Papers
No similar papers found.