Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse

📅 2023-11-09

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Hard negative sampling in contrastive learning critically influences representation geometry, yet its precise roles in mitigating dimensional collapse (DC) and inducing neural collapse (NC) remain poorly understood. Method: We develop a generalized contrastive loss framework, integrating equiangular tight frame (ETF) geometric modeling and unit-sphere normalization analysis. Contribution/Results: We provide the first rigorous proof that, under both supervised and unsupervised hard contrastive learning (HSCL/HUCL), any global optimum necessarily exhibits NC—i.e., class means form an ETF and intra-class features collapse to identical points. Crucially, we extend this result to generic losses including InfoNCE without assuming class-conditional independence. Theory and experiments jointly demonstrate that NC emerges stably *only* when hard negative sampling is synergistically combined with feature normalization under Adam-based batch optimization; otherwise, DC prevails. Our code is publicly available.

📝 Abstract

For a widely-studied data model and general loss and sample-hardening functions we prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC), i.e., the class means form an Equiangular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses. In contrast to existing literature, our theoretical results for SCL do not require class-conditional independence of augmented views and work for a general loss function class that includes the widely used InfoNCE loss function. Moreover, our proofs are simpler, compact, and transparent. Similar to existing literature, our theoretical claims also hold for the practical scenario where batching is used for optimization. We empirically demonstrate, for the first time, that Adam optimization (with batching) of HSCL and HUCL losses with random initialization and suitable hardness levels can indeed converge to the NC-geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard-negatives or feature normalization, however, the representations learned via Adam suffer from Dimensional-Collapse (DC) and fail to attain the NC-geometry. These results exemplify the role of hard-negative sampling in contrastive representation learning and we conclude with several open theoretical problems for future work. The code can be found at https://github.com/rjiang03/HCL/tree/main

Problem

Research questions and friction points this paper is trying to address.

Analyzes contrastive learning losses minimized by Neural-Collapse geometry

Proves hard-negative sampling losses are lower bounded by standard losses

Demonstrates Adam optimization with hard-negatives achieves Neural-Collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hard-negative sampling optimizes contrastive learning losses

Neural-Collapse geometry achieved with unit-ball normalization

Dimensional-Collapse avoided via hard-negative and normalization

🔎 Similar Papers

No similar papers found.

Authors to Follow