🤖 AI Summary
Deep models suffer from overfitting in few-shot learning and under distributional shift, while existing mask-based data augmentation methods lack structural awareness. Method: This paper proposes a structure-aware granular-ball-guided data augmentation method. It introduces Granular Ball Computing (GBC) into data augmentation for the first time, employing a coarse-to-fine hierarchical masking strategy that adaptively preserves semantically rich and structurally critical regions while suppressing redundant information. The method constitutes a hierarchical, model-agnostic, plug-and-play framework compatible with both CNNs and Vision Transformers (ViTs). Contribution/Results: Evaluated on multiple benchmarks, it significantly improves classification accuracy and enhances masked image reconstruction quality. Crucially, it achieves adaptive mask generation jointly guided by semantic and structural cues—constituting the core innovation—thereby comprehensively strengthening model generalization and robustness under limited-data and out-of-distribution scenarios.
📝 Abstract
Deep learning models have achieved remarkable success in computer vision, but they still rely heavily on large-scale labeled data and tend to overfit when data are limited or distributions shift. Data augmentation, particularly mask-based information dropping, can enhance robustness by forcing models to explore complementary cues; however, existing approaches often lack structural awareness and may discard essential semantics. We propose Granular-ball Guided Masking (GBGM), a structure-aware augmentation strategy guided by Granular-ball Computing (GBC). GBGM adaptively preserves semantically rich, structurally important regions while suppressing redundant areas through a coarse-to-fine hierarchical masking process, producing augmentations that are both representative and discriminative. Extensive experiments on multiple benchmarks demonstrate consistent improvements in classification accuracy and masked image reconstruction, confirming the effectiveness and broad applicability of the proposed method. Simple and model-agnostic, it integrates seamlessly into CNNs and Vision Transformers and provides a new paradigm for structure-aware data augmentation.