SelectMix: Enhancing Label Noise Robustness through Targeted Sample Mixing

πŸ“… 2025-09-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Deep neural networks are prone to overfitting under label noise, leading to severe degradation in generalization performance. To address this, we propose SelectMixβ€”a noise-robust learning framework comprising three key components: (1) confidence-mismatch analysis via K-fold cross-validation to precisely identify uncertain samples; (2) a class-aware selective mixing strategy that fuses only low-confidence samples with high-confidence samples from the same underlying class; and (3) soft-label alignment for mixed samples to preserve supervision fidelity and prevent noise propagation. Unlike conventional Mixup, SelectMix abandons indiscriminate interpolation and enables controllable, semantics-preserving augmentation. Extensive experiments on MNIST, CIFAR-10/100 under various synthetic noise settings, and the real-world noisy dataset Clothing1M demonstrate consistent superiority over state-of-the-art methods, validating both effectiveness and generalizability.

Technology Category

Application Category

πŸ“ Abstract
Deep neural networks tend to memorize noisy labels, severely degrading their generalization performance. Although Mixup has demonstrated effectiveness in improving generalization and robustness, existing Mixup-based methods typically perform indiscriminate mixing without principled guidance on sample selection and mixing strategy, inadvertently propagating noisy supervision. To overcome these limitations, we propose SelectMix, a confidence-guided mixing framework explicitly tailored for noisy labels. SelectMix first identifies potentially noisy or ambiguous samples through confidence based mismatch analysis using K-fold cross-validation, then selectively blends identified uncertain samples with confidently predicted peers from their potential classes. Furthermore, SelectMix employs soft labels derived from all classes involved in the mixing process, ensuring the labels accurately represent the composition of the mixed samples, thus aligning supervision signals closely with the actual mixed inputs. Through extensive theoretical analysis and empirical evaluations on multiple synthetic (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100) and real-world benchmark datasets (CIFAR-N, MNIST and Clothing1M), we demonstrate that SelectMix consistently outperforms strong baseline methods, validating its effectiveness and robustness in learning with noisy labels.
Problem

Research questions and friction points this paper is trying to address.

Addresses deep neural networks memorizing noisy labels
Proposes targeted sample mixing to reduce noise propagation
Enhances generalization with confidence-guided label assignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Confidence-guided mixing framework
Soft labels from mixed classes
K-fold cross-validation sample selection
πŸ”Ž Similar Papers
No similar papers found.
Q
Qiuhao Liu
The Hong Kong University of Science and Technology (Guangzhou)
L
Ling Li
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yao Lu
BIAI, ZJUT
Qi Xuan
Qi Xuan
Professor, Zhejiang University of Technology
AI SecuritySocial NetworkDeep LearningData Mining
Zhaowei Zhu
Zhaowei Zhu
Docta.ai; University of California, Santa Cruz
Machine learningData QualityLabel NoiseResponsible AI
J
Jiaheng Wei
The Hong Kong University of Science and Technology (Guangzhou)