Classification Imbalance as Transfer Learning

πŸ“… 2026-01-15
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the issue of model bias toward majority classes in imbalanced classification by formally framing it as a label shift domain adaptation problem between the source distribution (observed data) and the target distribution (balanced evaluation distribution). The authors introduce the concept of β€œtransfer cost” and provide theoretical analysis showing that SMOTE incurs higher transfer cost than random oversampling methods such as Bootstrap in medium- to high-dimensional spaces. Building on this framework, they integrate minority class distribution estimation into data augmentation and empirically demonstrate that random oversampling generally outperforms SMOTE in such settings. These findings offer both theoretical justification and practical guidance for selecting oversampling strategies in imbalanced classification tasks.

Technology Category

Application Category

πŸ“ Abstract
Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.
Problem

Research questions and friction points this paper is trying to address.

classification imbalance
transfer learning
label shift
oversampling
SMOTE
Innovation

Methods, ideas, or system contributions that make the work stand out.

transfer learning
class imbalance
SMOTE
oversampling
label shift
πŸ”Ž Similar Papers
No similar papers found.
E
Eric Xia
Department of Operations Research & Financial Engineering, Princeton University, Princeton NJ
Jason M. Klusowski
Jason M. Klusowski
Assistant Professor, Department of Operations Research & Financial Engineering
statisticsprobabilitymachine learninginformation theory