Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing CutMix methods suffer from rectangular cropping, which discards local discriminative information and overemphasizes global semantics at the expense of part-awareness; moreover, they rely on dual forward passes or auxiliary networks to mitigate label–image inconsistency, compromising efficiency and generalization. This paper proposes LGCOAMix—the first CutMix variant integrating superpixel segmentation and attention mechanisms to establish a grid-level context-aware mixing strategy. It enables object-part alignment and cross-image superpixel contrast, achieving end-to-end optimization of label consistency without auxiliary networks or dual forward propagation. LGCOAMix is architecture-agnostic, compatible with both CNNs and Transformers. Extensive experiments demonstrate superior classification accuracy over mainstream CutMix variants across multiple benchmarks. Notably, it significantly improves weakly supervised object localization performance on CUB200-2011, validating its authenticity, computational efficiency, and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Cutmix-based data augmentation, which uses a cut-and-paste strategy, has shown remarkable generalization capabilities in deep learning. However, existing methods primarily consider global semantics with image-level constraints, which excessively reduces attention to the discriminative local context of the class and leads to a performance improvement bottleneck. Moreover, existing methods for generating augmented samples usually involve cutting and pasting rectangular or square regions, resulting in a loss of object part information. To mitigate the problem of inconsistency between the augmented image and the generated mixed label, existing methods usually require double forward propagation or rely on an external pre-trained network for object centering, which is inefficient. To overcome the above limitations, we propose LGCOAMix, an efficient context-aware and object-part-aware superpixel-based grid blending method for data augmentation. To the best of our knowledge, this is the first time that a label mixing strategy using a superpixel attention approach has been proposed for cutmix-based data augmentation. It is the first instance of learning local features from discriminative superpixel-wise regions and cross-image superpixel contrasts. Extensive experiments on various benchmark datasets show that LGCOAMix outperforms state-of-the-art cutmix-based data augmentation methods on classification tasks, {and weakly supervised object location on CUB200-2011.} We have demonstrated the effectiveness of LGCOAMix not only for CNN networks, but also for Transformer networks. Source codes are available at https://github.com/DanielaPlusPlus/LGCOAMix.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottleneck from global semantics ignoring local context

Resolves loss of object part information from rectangular cut-paste methods

Eliminates inefficiency of double forward propagation or external networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses superpixel attention for label mixing strategy

Learns local features from discriminative superpixel regions

Applies cross-image superpixel contrasts for augmentation

🔎 Similar Papers

No similar papers found.

Authors to Follow