Training-Free Dataset Pruning for Instance Segmentation

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dataset pruning methods are primarily designed for image classification and struggle to adapt to instance segmentation—due to its pixel-level annotations, large inter-instance scale variations, and severe class imbalance, which collectively exacerbate pruning difficulty; moreover, most rely on model training, resulting in poor efficiency. This paper introduces the first training-free pruning framework tailored for instance segmentation. We propose a Shape Complexity Score (SCS) grounded in geometric shape analysis, and further develop two variants: Scale-Invariant SCS (SI-SCS) and Class-Balanced SCS (CB-SCS), eliminating dependence on model training. Our method computes instance-level scores solely from image-level labels and geometric attributes (e.g., contour compactness, aspect ratio), enabling highly efficient pruning. Evaluated on PASCAL VOC 2012, Cityscapes, and COCO, it achieves state-of-the-art pruning performance while maintaining compatibility with both CNN- and Transformer-based architectures. Pruning speed is accelerated by an average factor of 1349× over training-dependent baselines.

Technology Category

Application Category

📝 Abstract
Existing dataset pruning techniques primarily focus on classification tasks, limiting their applicability to more complex and practical tasks like instance segmentation. Instance segmentation presents three key challenges: pixel-level annotations, instance area variations, and class imbalances, which significantly complicate dataset pruning efforts. Directly adapting existing classification-based pruning methods proves ineffective due to their reliance on time-consuming model training process. To address this, we propose a novel Training-Free Dataset Pruning (TFDP) method for instance segmentation. Specifically, we leverage shape and class information from image annotations to design a Shape Complexity Score (SCS), refining it into a Scale-Invariant (SI-SCS) and Class-Balanced (CB-SCS) versions to address instance area variations and class imbalances, all without requiring model training. We achieve state-of-the-art results on VOC 2012, Cityscapes, and COCO datasets, generalizing well across CNN and Transformer architectures. Remarkably, our approach accelerates the pruning process by an average of 1349$ imes$ on COCO compared to the adapted baselines. Source code is available at: https://github.com/he-y/dataset-pruning-for-instance-segmentation
Problem

Research questions and friction points this paper is trying to address.

Address dataset pruning for instance segmentation tasks
Overcome challenges like pixel-level annotations and class imbalances
Develop training-free method to accelerate pruning process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-Free Dataset Pruning (TFDP) method
Shape Complexity Score (SCS) for instance segmentation
Scale-Invariant and Class-Balanced SCS versions
🔎 Similar Papers
No similar papers found.
Yalun Dai
Yalun Dai
Nanyang Technological University
deep learning
Lingao Xiao
Lingao Xiao
National University of Singapore
Efficient Deep Learning
I
Ivor W. Tsang
CFAR, Agency for Science, Technology and Research, Singapore; Nanyang Technological University
Y
Yang He
CFAR, Agency for Science, Technology and Research, Singapore; National University of Singapore