Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse

📅 2023-11-09
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Hard negative sampling in contrastive learning critically influences representation geometry, yet its precise roles in mitigating dimensional collapse (DC) and inducing neural collapse (NC) remain poorly understood. Method: We develop a generalized contrastive loss framework, integrating equiangular tight frame (ETF) geometric modeling and unit-sphere normalization analysis. Contribution/Results: We provide the first rigorous proof that, under both supervised and unsupervised hard contrastive learning (HSCL/HUCL), any global optimum necessarily exhibits NC—i.e., class means form an ETF and intra-class features collapse to identical points. Crucially, we extend this result to generic losses including InfoNCE without assuming class-conditional independence. Theory and experiments jointly demonstrate that NC emerges stably *only* when hard negative sampling is synergistically combined with feature normalization under Adam-based batch optimization; otherwise, DC prevails. Our code is publicly available.
📝 Abstract
For a widely-studied data model and general loss and sample-hardening functions we prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC), i.e., the class means form an Equiangular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses. In contrast to existing literature, our theoretical results for SCL do not require class-conditional independence of augmented views and work for a general loss function class that includes the widely used InfoNCE loss function. Moreover, our proofs are simpler, compact, and transparent. Similar to existing literature, our theoretical claims also hold for the practical scenario where batching is used for optimization. We empirically demonstrate, for the first time, that Adam optimization (with batching) of HSCL and HUCL losses with random initialization and suitable hardness levels can indeed converge to the NC-geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard-negatives or feature normalization, however, the representations learned via Adam suffer from Dimensional-Collapse (DC) and fail to attain the NC-geometry. These results exemplify the role of hard-negative sampling in contrastive representation learning and we conclude with several open theoretical problems for future work. The code can be found at https://github.com/rjiang03/HCL/tree/main
Problem

Research questions and friction points this paper is trying to address.

Analyzes contrastive learning losses minimized by Neural-Collapse geometry
Proves hard-negative sampling losses are lower bounded by standard losses
Demonstrates Adam optimization with hard-negatives achieves Neural-Collapse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hard-negative sampling optimizes contrastive learning losses
Neural-Collapse geometry achieved with unit-ball normalization
Dimensional-Collapse avoided via hard-negative and normalization
🔎 Similar Papers
No similar papers found.
R
Ruijie Jiang
Department of Electrical Engineering, Tufts University
T
Thuan Q. Nguyen
Department of Engineering, Engineering Technology, East Tennessee State University
Shuchin Aeron
Shuchin Aeron
Professor, Electrical and Computer Engineering, Tufts University
Signal ProcessingMachine LearningHigh-dim StatisticsOptimal Transport
P
P. Ishwar
Department of Electrical and Computer Engineering, Boston University