Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work establishes a theoretical foundation for self-supervised contrastive learning (CL), clarifying its relationship with and efficacy mechanism relative to supervised negative-sample contrastive loss (NSCL). We prove that, asymptotically as the number of classes grows, the CL objective converges to NSCL, and its globally optimal representation exhibits a tripartite geometric structure: augmentation collapse, intra-class collapse, and class centroids forming a simplex-based equiangular tight frame. We provide the first rigorous demonstration that CL implicitly approximates NSCL, derive a novel few-shot generalization bound dominated by directional variability, achieving convergence rate O(1/number of classes). Numerical experiments confirm high correlation between CL and NSCL losses, closely aligned optimization trajectories, and tight empirical agreement between our bound and linear-probe accuracy.

Technology Category

Application Category

📝 Abstract
Despite its empirical success, the theoretical foundations of self-supervised contrastive learning (CL) are not yet fully established. In this work, we address this gap by showing that standard CL objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL), which excludes same-class contrasts. We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent. We characterize the geometric structure of the global minimizers of the NSCL loss: the learned representations exhibit augmentation collapse, within-class collapse, and class centers that form a simplex equiangular tight frame. We further introduce a new bound on the few-shot error of linear-probing. This bound depends on two measures of feature variability--within-class dispersion and variation along the line between class centers. We show that directional variation dominates the bound and that the within-class dispersion's effect diminishes as the number of labeled samples increases. These properties enable CL and NSCL-trained representations to support accurate few-shot label recovery using simple linear probes. Finally, we empirically validate our theoretical findings: the gap between CL and NSCL losses decays at a rate of $mathcal{O}(frac{1}{# ext{classes}})$; the two losses are highly correlated; minimizing the CL loss implicitly brings the NSCL loss close to the value achieved by direct minimization; and the proposed few-shot error bound provides a tight estimate of probing performance in practice.
Problem

Research questions and friction points this paper is trying to address.

Theoretical foundations of self-supervised contrastive learning are not fully established.
Standard CL objectives implicitly approximate a supervised variant called NSCL.
Characterize geometric structure and few-shot error of linear-probing for CL.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised CL approximates supervised NSCL loss
NSCL loss minimizers form simplex equiangular tight frame
Few-shot error bound depends on feature variability
🔎 Similar Papers
No similar papers found.