Rebalancing the Scales: A Systematic Mapping Study of Generative Adversarial Networks (GANs) in Addressing Data Imbalance

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data imbalance severely hampers model performance in critical domains such as healthcare, finance, and cybersecurity. Method: This study conducts a systematic mapping review of 3,041 publications to identify and analyze 100 high-impact works on Generative Adversarial Networks (GANs) for synthetic data generation, establishing the first three-dimensional analytical framework—“Application Domain–GAN Technique–Architectural Variant.” Contribution/Results: Vanilla GAN, CTGAN, and CGAN demonstrate superior efficacy for structured-data oversampling. Customized, advanced GAN architectures yield statistically significant improvements in classification performance. Notably, integrating diffusion models or reinforcement learning with GANs remains unexplored. The findings provide both theoretical grounding and actionable technical guidance for deploying GAN-based solutions to address data imbalance in real-world applications.

Technology Category

Application Category

📝 Abstract
Machine learning algorithms are used in diverse domains, many of which face significant challenges due to data imbalance. Studies have explored various approaches to address the issue, like data preprocessing, cost-sensitive learning, and ensemble methods. Generative Adversarial Networks (GANs) showed immense potential as a data preprocessing technique that generates good quality synthetic data. This study employs a systematic mapping methodology to analyze 3041 papers on GAN-based sampling techniques for imbalanced data sourced from four digital libraries. A filtering process identified 100 key studies spanning domains such as healthcare, finance, and cybersecurity. Through comprehensive quantitative analysis, this research introduces three categorization mappings as application domains, GAN techniques, and GAN variants used to handle the imbalanced nature of the data. GAN-based over-sampling emerges as an effective preprocessing method. Advanced architectures and tailored frameworks helped GANs to improve further in the case of data imbalance. GAN variants like vanilla GAN, CTGAN, and CGAN show great adaptability in structured imbalanced data cases. Interest in GANs for imbalanced data has grown tremendously, touching a peak in recent years, with journals and conferences playing crucial roles in transmitting foundational theories and practical applications. While with these advances, none of the reviewed studies explicitly explore hybridized GAN frameworks with diffusion models or reinforcement learning techniques. This gap leads to a future research idea develop innovative approaches for effectively handling data imbalance.
Problem

Research questions and friction points this paper is trying to address.

GANs address data imbalance in machine learning
Systematic mapping of GAN-based sampling techniques
Hybrid GAN frameworks for future research
Innovation

Methods, ideas, or system contributions that make the work stand out.

GANs for data preprocessing
Systematic mapping methodology
Advanced GAN architectures
🔎 Similar Papers
No similar papers found.