๐ค AI Summary
This work addresses the high computational cost and training complexity of existing self-supervised learning methods, which often rely on large batch sizes, memory banks, or momentum encoders. The authors propose a Semantic Mutual Information (SMI) objective that models mutual information through a sample-level dependency matrix under a Gaussian assumption and incorporates nonlinear transformations to enhance semantic alignment and feature diversity. By avoiding high-dimensional correlation matrices and adopting a lightweight optimization mechanism, SMI achieves a superior trade-off between alignment and redundancy while significantly reducing computational complexity. Using a ResNet-50 backbone, SMI matches state-of-the-art methods in linear evaluation accuracy on ImageNet and substantially outperforms Barlow Twins in low-resource and fine-grained tasks, demonstrating stronger spatially localized representation capabilities.
๐ Abstract
Self-supervised learning (SSL) has achieved remarkable representation learning performance, but many existing methods rely on large batch sizes, memory banks, momentum encoders, or global synchronization mechanisms that substantially increase computational cost and training complexity. In this work, we propose Semantic Mutual Information (SMI), a lightweight self-supervised objective derived from a mutual-information-inspired dependency formulation under Gaussian assumptions. Unlike conventional correlation matching objectives that operate on high-dimensional feature correlation matrices, SMI performs optimization on a sample-level dependency matrix through a nonlinear transformation of pairwise correlations. This formulation induces distinct optimization dynamics that emphasize strongly dependent semantic pairs while maintaining representation diversity. Experimental results on ImageNet using a ResNet-50 backbone demonstrate that SMI achieves competitive linear evaluation performance relative to state-of-the-art SSL approaches while substantially reducing computational complexity. Across multiple low-resource benchmarks, SMI consistently improves transfer performance over Barlow Twins, particularly on fine-grained datasets. Furthermore, analyses of optimization dynamics and representation geometry suggest improved alignment--redundancy balance, greater feature diversity, and more spatially localized semantic representations. These results indicate that nonlinear dependency optimization provides an effective and computationally efficient alternative to conventional correlation-based self-supervised learning objectives.