Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel data augmentation framework based on Decision Predicate Graphs (DPG-da) to address the limitations of traditional oversampling methods, which often generate unrealistic, infeasible, or uninterpretable synthetic samples. By integrating interpretable decision predicates extracted from trained models into the oversampling process and embedding domain-specific logical rules, the proposed approach ensures that generated samples are not only diverse but also logically consistent and semantically plausible. Experimental results across multiple synthetic and real-world imbalanced datasets demonstrate that DPG-da significantly outperforms existing oversampling techniques in terms of classification performance while providing transparent and traceable explanations for the synthesized instances.

Technology Category

Application Category

📝 Abstract
Many machine learning classification tasks involve imbalanced datasets, which are often subject to over-sampling techniques aimed at improving model performance. However, these techniques are prone to generating unrealistic or infeasible samples. Furthermore, they often function as black boxes, lacking interpretability in their procedures. This opacity makes it difficult to track their effectiveness and provide necessary adjustments, and they may ultimately fail to yield significant performance improvements. To bridge this gap, we introduce the Decision Predicate Graphs for Data Augmentation (DPG-da), a framework that extracts interpretable decision predicates from trained models to capture domain rules and enforce them during sample generation. This design ensures that over-sampled data remain diverse, constraint-satisfying, and interpretable. In experiments on synthetic and real-world benchmark datasets, DPG-da consistently improves classification performance over traditional over-sampling methods, while guaranteeing logical validity and offering clear, interpretable explanations of the over-sampled data.
Problem

Research questions and friction points this paper is trying to address.

imbalanced learning
data augmentation
over-sampling
interpretability
feasibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

interpretable data augmentation
imbalanced learning
decision predicate graphs
feasible sample generation
model interpretability
M
Matheus Camilo da Silva
Department of Mathematics, Computer Science and Geosciences, University of Trieste, Piazzale Europa, 1, 34127 Trieste TS, Italy
G
Gabriel Gustavo Costanzo
Department of Mathematics, Computer Science and Geosciences, University of Trieste, Piazzale Europa, 1, 34127 Trieste TS, Italy
A
Andrea de Lorenzo
Department of Mathematics, Computer Science and Geosciences, University of Trieste, Piazzale Europa, 1, 34127 Trieste TS, Italy
Sylvio Barbon Junior
Sylvio Barbon Junior
University of Trieste, Department of Engineering and Architecture
Machine LearningProcess MiningExplainable AI