🤖 AI Summary
Existing self-supervised learning methods are largely confined to Euclidean spaces, limiting their ability to capture nonlinear geometric structures inherent in data. To address this, we propose the first fully kernelized variant of VICReg, introducing a general self-supervised framework grounded in reproducing kernel Hilbert spaces (RKHS). Our method reformulates VICReg’s variance, invariance, and covariance regularization objectives using doubly-centered kernel matrices and the Hilbert–Schmidt norm—enabling nonlinear representation learning without explicit feature mapping and theoretically preventing representation collapse. Evaluated across benchmarks from MNIST to ImageNet-100, our approach consistently outperforms standard VICReg, with particularly pronounced gains on low-shot and high-curvature datasets. UMAP visualizations further confirm that learned embeddings exhibit enhanced inter-class separation and local isometry. This work bridges kernel methods and modern self-supervised learning, offering a theoretically grounded, geometry-aware alternative to Euclidean-based contrastive and redundancy-reduction paradigms.
📝 Abstract
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss-variance, invariance, and covariance--we obtain a general formulation that operates on double-centered kernel matrices and Hilbert-Schmidt norms, enabling nonlinear feature learning without explicit mappings.
We demonstrate that Kernel VICReg not only avoids representational collapse but also improves performance on tasks with complex or small-scale data. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations further confirm that kernel-based embeddings exhibit better isometry and class separation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.