Enhancing Synthetic Oversampling for Imbalanced Datasets Using Proxima-Orion Neighbors and q-Gaussian Weighting Technique

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the scarcity of minority-class samples in imbalanced classification, this paper proposes PO-QG, a novel oversampling algorithm. The method introduces the Proxima-Orion dual-anchor neighborhood selection mechanism—first of its kind—which jointly leverages majority-class density estimation and relative-distance weighting to precisely identify informative minority instances. Subsequently, synthetic samples are generated via q-Gaussian distribution modeling, ensuring statistical validity, discriminative power, and diversity while preserving local manifold structure and enhancing inter-class separability. Extensive evaluation across 50 benchmark datasets—including KEEL/UCI and Indian sarcopenia clinical data—demonstrates that PO-QG significantly outperforms five state-of-the-art oversampling methods (Wilcoxon signed-rank test, *p* < 0.05), achieving average improvements of 4.2% in F1-score and 3.8% in G-mean.

Technology Category

Application Category

📝 Abstract

In this article, we propose a novel oversampling algorithm to increase the number of instances of minority class in an imbalanced dataset. We select two instances, Proxima and Orion, from the set of all minority class instances, based on a combination of relative distance weights and density estimation of majority class instances. Furthermore, the q-Gaussian distribution is used as a weighting mechanism to produce new synthetic instances to improve the representation and diversity. We conduct a comprehensive experiment on 42 datasets extracted from KEEL software and eight datasets from the UCI ML repository to evaluate the usefulness of the proposed (PO-QG) algorithm. Wilcoxon signed-rank test is used to compare the proposed algorithm with five other existing algorithms. The test results show that the proposed technique improves the overall classification performance. We also demonstrate the PO-QG algorithm to a dataset of Indian patients with sarcopenia.

Problem

Research questions and friction points this paper is trying to address.

Imbalanced Data

Classification Performance

Data Set Imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

PO-QG algorithm

data imbalance problem

Proxima-Orion and q-Gaussian weighting

🔎 Similar Papers

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants