π€ AI Summary
Deep neural networks are prone to overfitting under label noise, leading to severe degradation in generalization performance. To address this, we propose SelectMixβa noise-robust learning framework comprising three key components: (1) confidence-mismatch analysis via K-fold cross-validation to precisely identify uncertain samples; (2) a class-aware selective mixing strategy that fuses only low-confidence samples with high-confidence samples from the same underlying class; and (3) soft-label alignment for mixed samples to preserve supervision fidelity and prevent noise propagation. Unlike conventional Mixup, SelectMix abandons indiscriminate interpolation and enables controllable, semantics-preserving augmentation. Extensive experiments on MNIST, CIFAR-10/100 under various synthetic noise settings, and the real-world noisy dataset Clothing1M demonstrate consistent superiority over state-of-the-art methods, validating both effectiveness and generalizability.
π Abstract
Deep neural networks tend to memorize noisy labels, severely degrading their generalization performance. Although Mixup has demonstrated effectiveness in improving generalization and robustness, existing Mixup-based methods typically perform indiscriminate mixing without principled guidance on sample selection and mixing strategy, inadvertently propagating noisy supervision. To overcome these limitations, we propose SelectMix, a confidence-guided mixing framework explicitly tailored for noisy labels. SelectMix first identifies potentially noisy or ambiguous samples through confidence based mismatch analysis using K-fold cross-validation, then selectively blends identified uncertain samples with confidently predicted peers from their potential classes. Furthermore, SelectMix employs soft labels derived from all classes involved in the mixing process, ensuring the labels accurately represent the composition of the mixed samples, thus aligning supervision signals closely with the actual mixed inputs. Through extensive theoretical analysis and empirical evaluations on multiple synthetic (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100) and real-world benchmark datasets (CIFAR-N, MNIST and Clothing1M), we demonstrate that SelectMix consistently outperforms strong baseline methods, validating its effectiveness and robustness in learning with noisy labels.