Enhancing Sample Selection by Cutting Mislabeled Easy Examples

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

In noisy label learning, mislabeled easy examples (MEEs)—hard instances incorrectly predicted with high confidence as correctly labeled during early training—pose the most severe threat to model performance. This paper formally defines MEEs for the first time and proposes Early Cutting, a two-stage sample re-calibration paradigm: (1) an initial screening based on early-training confidence scores, followed by (2) dynamic re-calibration of confidence using model training trajectories in later epochs to precisely identify and discard MEEs. Unlike conventional single-stage confidence-based filtering, Early Cutting integrates temporal-aware re-selection with noise-robust training. Extensive experiments on CIFAR, WebVision, and ImageNet-1K demonstrate that Early Cutting significantly reduces the proportion of MEEs and achieves superior classification accuracy compared to state-of-the-art sample selection methods.

Technology Category

Application Category

📝 Abstract

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.

Problem

Research questions and friction points this paper is trying to address.

Identify harmful mislabeled easy examples

Improve sample selection in noisy labels

Enhance model performance by filtering MEEs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early Cutting technique

Recalibration step

Filter Mislabeled Easy Examples

🔎 Similar Papers

Can We Treat Noisy Labels as Accurate?