Self-Guided Masked Autoencoder

๐Ÿ“… 2025-07-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the intrinsic learning mechanisms of Masked Autoencoders (MAEs) and discovers that, early in pretraining, MAEs spontaneously develop pattern-driven clustering capabilities over image patches. Leveraging this insight, we propose Self-Guided Maskingโ€”a novel masking strategy that dynamically generates semantic-aware masks from the modelโ€™s intermediate features, eliminating reliance on external supervision or handcrafted priors. Unlike conventional random masking, our approach formulates mask selection as an adaptive clustering process in feature space, enabling internally driven optimization of the reconstruction objective. Evaluated on ImageNet-1K and downstream tasks including classification, detection, and segmentation, the method consistently improves transfer performance across diverse vision benchmarks. These results empirically validate both the discovered mechanistic principle and the effectiveness, robustness, and generalizability of the proposed self-guided masking paradigm.

Technology Category

Application Category

๐Ÿ“ Abstract
Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly learns. In this paper, with an in-depth analysis, we discover that MAE intrinsically learns pattern-based patch-level clustering from surprisingly early stages of pretraining. Upon this understanding, we propose self-guided masked autoencoder, which internally generates informed mask by utilizing its progress in patch clustering, substituting the naive random masking of the vanilla MAE. Our approach significantly boosts its learning process without relying on any external models or supplementary information, keeping the benefit of self-supervised nature of MAE intact. Comprehensive experiments on various downstream tasks verify the effectiveness of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Understands MAE's patch clustering learning process
Proposes self-guided masking to replace random masking
Enhances MAE's learning without external models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-guided masking replaces random masking
Utilizes patch clustering for informed masks
Boosts learning without external models
๐Ÿ”Ž Similar Papers
No similar papers found.