Variational Self-Supervised Learning

📅 2025-04-06

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work bridges the gap between variational modeling and modern self-supervised learning by proposing the first decoder-free variational self-supervised learning framework. To address the limitations of reconstruction-based objectives, the method employs a dual-encoder architecture with momentum-updated teacher networks, replacing pixel-level reconstruction with cross-view denoising. It introduces a cosine-scaled KL divergence and likelihood term to enforce semantic alignment in high-dimensional latent spaces. By eliminating the explicit decoder, the framework significantly improves training efficiency and representation quality. Extensive experiments on CIFAR-10/100 and ImageNet-100 demonstrate competitive or superior performance compared to state-of-the-art methods including BYOL and MoCo v3. These results validate the effectiveness and scalability of integrating probabilistic modeling principles—particularly variational inference—with contrastive-free self-supervision, offering a novel paradigm for efficient, principled representation learning.

Technology Category

Application Category

📝 Abstract

We present Variational Self-Supervised Learning (VSSL), a novel framework that combines variational inference with self-supervised learning to enable efficient, decoder-free representation learning. Unlike traditional VAEs that rely on input reconstruction via a decoder, VSSL symmetrically couples two encoders with Gaussian outputs. A momentum-updated teacher network defines a dynamic, data-dependent prior, while the student encoder produces an approximate posterior from augmented views. The reconstruction term in the ELBO is replaced with a cross-view denoising objective, preserving the analytical tractability of Gaussian KL divergence. We further introduce cosine-based formulations of KL and log-likelihood terms to enhance semantic alignment in high-dimensional latent spaces. Experiments on CIFAR-10, CIFAR-100, and ImageNet-100 show that VSSL achieves competitive or superior performance to leading self-supervised methods, including BYOL and MoCo V3. VSSL offers a scalable, probabilistically grounded approach to learning transferable representations without generative reconstruction, bridging the gap between variational modeling and modern self-supervised techniques.

Problem

Research questions and friction points this paper is trying to address.

Combines variational inference with self-supervised learning for efficient representation learning

Replaces decoder-based reconstruction with cross-view denoising for tractable optimization

Enhances semantic alignment in latent spaces via cosine-based KL and likelihood terms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines variational inference with self-supervised learning

Uses symmetric encoders with Gaussian outputs

Replaces reconstruction with cross-view denoising

🔎 Similar Papers

No similar papers found.