🤖 AI Summary
Existing contrastive learning methods lack a unified theoretical foundation for robustness against label noise. Method: This paper establishes the first general robustness criterion for supervised contrastive loss, derives the theoretically necessary and sufficient conditions for robustness, and rigorously proves that InfoNCE is not robust under label noise. Based on this analysis, we propose SymNCE—a symmetrically regularized contrastive loss—designed via functional perturbation analysis, explicit noise modeling, and symmetry-based loss construction. Contribution/Results: SymNCE enjoys both theoretical robustness guarantees and strong empirical performance. Moreover, our framework unifies and interprets mainstream robust techniques (e.g., nearest-neighbor filtering). Experiments on CIFAR-10/100 and WebVision demonstrate that SymNCE significantly outperforms state-of-the-art methods, maintaining superior generalization even under high label noise (≥60%), thereby validating the efficacy of theory-driven design.
📝 Abstract
Learning from noisy labels is a critical challenge in machine learning, with vast implications for numerous real-world scenarios. While supervised contrastive learning has recently emerged as a powerful tool for navigating label noise, many existing solutions remain heuristic, often devoid of a systematic theoretical foundation for crafting robust supervised contrastive losses. To address the gap, in this paper, we propose a unified theoretical framework for robust losses under the pairwise contrastive paradigm. In particular, we for the first time derive a general robust condition for arbitrary contrastive losses, which serves as a criterion to verify the theoretical robustness of a supervised contrastive loss against label noise. The theory indicates that the popular InfoNCE loss is in fact non-robust, and accordingly inspires us to develop a robust version of InfoNCE, termed Symmetric InfoNCE (SymNCE). Moreover, we highlight that our theory is an inclusive framework that provides explanations to prior robust techniques such as nearest-neighbor (NN) sample selection and robust contrastive loss. Validation experiments on benchmark datasets demonstrate the superiority of SymNCE against label noise.