🤖 AI Summary
This work identifies and quantifies a systematic label assignment bias in CutMix: by weighting labels according to the area of the pasted region, it often assigns non-zero weights to images where the pasted region predominantly covers background rather than semantic objects. To address this, the authors propose Object-Aware CutMix (OA-CutMix), which preserves the original mixing strategy but reassigns label weights based on the proportion of visible object regions in each image, leveraging precomputed segmentation masks. Extensive experiments demonstrate that OA-CutMix consistently outperforms over a dozen static and dynamic mixing methods across four backbone architectures and six datasets, with particularly notable gains in scenarios involving small objects, while incurring significantly lower training overhead compared to dynamic approaches.
📝 Abstract
CutMix has become the de facto standard mixing augmentation, yet its label assignment rests on a flawed assumption: The area of the pasted patch faithfully reflects its semantic contribution to the mixed image. In practice, however, patches frequently land on background regions, assigning label credit to classes whose objects are not visible. The mean discrepancy of the CutMix label and the semantic object area is $21.5\%$. In $17\%$ of samples an image contributes zero visible object pixels yet receives nonzero label weight. We propose Object-Aware CutMix (OA-CutMix), which corrects this bias by replacing the area-based CutMix weight with one derived from precomputed segmentation masks, assigning labels in proportion to the visible object area each image contributes to the mix. The image mixing procedure is left entirely unchanged. We evaluate OA-CutMix against 10+ static and dynamic mixing methods across 4 architectures and 6 datasets. OA-CutMix consistently achieves the highest accuracy over all tasks, outperforming even dynamic mixing methods, but at a fraction of the training-time cost. Improvements are largest for small objects, where the label bias from CutMix is greatest. Thus, correcting the label is sufficient to match or exceed the performance of methods modifying the image mixing algorithm.