🤖 AI Summary
This work investigates how label noise affects downstream classification performance in contrastive learning, with particular emphasis on how dimensionality reduction modulates this impact. We first provide a theoretical analysis—grounded in dimensionality reduction—of how label errors propagate through augmentation graphs, revealing that singular value decomposition (SVD) suppresses interference from mislabeled samples, yet excessive dimensionality reduction degrades graph connectivity and impairs representation discriminability. To address this trade-off, we propose a joint optimization strategy that simultaneously refines dimensionality reduction and data augmentation to balance label robustness and graph structural integrity. Theoretical analysis, corroborated by extensive experiments, demonstrates that moderate embedding dimensions (e.g., 512–1024), weak augmentation, and SVD jointly mitigate the adverse effects of labeling errors while preserving graph connectivity, thereby improving downstream classification accuracy. Our core contribution lies in uncovering the dual role of dimensionality reduction in contrastive learning robustness and establishing a principled, actionable framework for its calibration.
📝 Abstract
In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.