The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work investigates under what positive sample sampling conditions contrastive learning can recover a meaningful geometric structure in the latent space. By constructing a measure-theoretic framework, the study introduces a “diversity condition” as a necessary requirement for the identifiability of latent geometry and elucidates the joint influence of sampling support and encoder inductive bias on representation identifiability. Theoretically, it is shown that under full-support sampling, the global optimum of InfoNCE recovers the latent structure up to orthogonal equivalence; however, under non-full support, non-orthogonal mappings may yield better solutions. To address this, the authors propose a support-corrected variant of InfoNCE and model representations using the von Mises–Fisher distribution, empirically validating on both synthetic and real-world data the critical role of inductive bias when sampling diversity is limited.

📝 Abstract

Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing the diversity condition, a support requirement on positive-pair sampling that is necessary for isometric latent recovery. We show that the standard full-support von Mises-Fisher setting implies the satisfaction of the diversity condition and as a consequence global contrastive loss minimizers recover latent geometry up to orthogonal transformation, while restricted conditionals can make non-orthogonal maps attain strictly lower asymptotic contrastive loss. We introduce a support-corrected Information Noise Contrastive Estimation (InfoNCE) variant as a theoretical fix: this correction makes orthogonal latent space recovery achievable but does not uniquely select it. Experiments on synthetic benchmarks validate the identifiability predictions, and CIFAR-10 experiments are consistent with the qualitative prediction that architectural inductive bias becomes more important when sampling diversity is limited. Together, our results clarify how sampling mechanisms and encoder inductive bias interact in contrastive representation learning.

Problem

Research questions and friction points this paper is trying to address.

contrastive learning

sampling conditions

inductive bias

latent geometry

identifiability

Innovation

Methods, ideas, or system contributions that make the work stand out.

contrastive learning

diversity condition

inductive bias