🤖 AI Summary
To address the scarcity of minority-class samples in imbalanced classification, this paper proposes PO-QG, a novel oversampling algorithm. The method introduces the Proxima-Orion dual-anchor neighborhood selection mechanism—first of its kind—which jointly leverages majority-class density estimation and relative-distance weighting to precisely identify informative minority instances. Subsequently, synthetic samples are generated via q-Gaussian distribution modeling, ensuring statistical validity, discriminative power, and diversity while preserving local manifold structure and enhancing inter-class separability. Extensive evaluation across 50 benchmark datasets—including KEEL/UCI and Indian sarcopenia clinical data—demonstrates that PO-QG significantly outperforms five state-of-the-art oversampling methods (Wilcoxon signed-rank test, *p* < 0.05), achieving average improvements of 4.2% in F1-score and 3.8% in G-mean.
📝 Abstract
In this article, we propose a novel oversampling algorithm to increase the number of instances of minority class in an imbalanced dataset. We select two instances, Proxima and Orion, from the set of all minority class instances, based on a combination of relative distance weights and density estimation of majority class instances. Furthermore, the q-Gaussian distribution is used as a weighting mechanism to produce new synthetic instances to improve the representation and diversity. We conduct a comprehensive experiment on 42 datasets extracted from KEEL software and eight datasets from the UCI ML repository to evaluate the usefulness of the proposed (PO-QG) algorithm. Wilcoxon signed-rank test is used to compare the proposed algorithm with five other existing algorithms. The test results show that the proposed technique improves the overall classification performance. We also demonstrate the PO-QG algorithm to a dataset of Indian patients with sarcopenia.