VISReg: Variance-Invariance-Sketching Regularization for JEPA training

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge in self-supervised learning of simultaneously modeling the full shape of embedding distributions and maintaining training stability while preventing representation collapse. The authors propose VISReg, which decouples distributional shape constraints from scale control for the first time: it employs the sliced Wasserstein distance instead of covariance regularization to capture the distribution shape, complemented by an independent variance term to regulate scale. By integrating the flexibility of VICReg with the distributional rigor of sketching, VISReg effectively mitigates collapse and provides stable gradients. Evaluated within the JEPA framework and pretrained on ImageNet-1K and ImageNet-22K, VISReg achieves state-of-the-art performance across out-of-distribution, low-quality, long-tailed, and low-rank data settings.

📝 Abstract

Self-supervised learning methods prevent embedding collapse via modeling heuristics or explicit regularization of the embedding space. Among the latter, VICReg decomposes regularization into variance and covariance objectives, offering flexibility and interpretability. However, covariance captures only second-order statistics -- encouraging decorrelation but failing to enforce the full distributional shape needed for stable training. Sketching-based methods such as SIGReg address this by aligning embeddings to an isotropic Gaussian, but lack flexibility and suffer from vanishing gradients under collapse. We propose Variance-Invariance-Sketching Regularization (VISReg), which replaces covariance with a Sliced-Wasserstein-based sketching objective that enforces full distributional shape, while retaining a variance term for scale control. By decoupling scale and shape, VISReg combines VICReg's flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse. We show that VISReg scales linearly, outperforms existing regularization on low-quality datasets, and is resilient to long-tailed and low-rank regimes. Pre-trained on ImageNet-1K, VISReg achieves state-of-the-art performance on out-of-distribution datasets. Pre-trained on ImageNet-22K, it matches DINOv2's OOD performance despite the latter using 10x more data (LVD-142M). Project and code: https://haiyuwu.github.io/visreg.

Problem

Research questions and friction points this paper is trying to address.

self-supervised learning

embedding collapse

regularization

distributional shape

covariance

Innovation

Methods, ideas, or system contributions that make the work stand out.

VISReg

Sliced-Wasserstein

self-supervised learning