๐ค AI Summary
This study addresses the challenge of efficiently identifying and interpreting morphology-related, human-interpretable features in Euclid Q1 galaxy imagesโfeatures that extend beyond the Galaxy Zoo decision-tree framework and are embedded within pre-trained neural networks.
Method: We propose a feature disentanglement approach based on sparse autoencoders (SAEs), jointly leveraging the supervised Zoobot model and self-supervised masked autoencoding (MAE) to extract unambiguous, semantically meaningful galaxy morphology representations from Euclid Q1 data.
Contribution/Results: Compared to conventional dimensionality-reduction methods (e.g., PCA), SAE-learned features exhibit significantly higher alignment with Galaxy Zoo labels and uncover novel, previously undefined astronomical structural patterns. The released MAE model achieves superhuman image reconstruction performance. To our knowledge, this work constitutes the first systematic effort to mine interpretable galaxy morphology features from pre-trained vision models, establishing a new paradigm for intelligent analysis of astronomical imagery.
๐ Abstract
Sparse Autoencoders (SAEs) can efficiently identify candidate monosemantic features from pretrained neural networks for galaxy morphology. We demonstrate this on Euclid Q1 images using both supervised (Zoobot) and new self-supervised (MAE) models. Our publicly released MAE achieves superhuman image reconstruction performance. While a Principal Component Analysis (PCA) on the supervised model primarily identifies features already aligned with the Galaxy Zoo decision tree, SAEs can identify interpretable features outside of this framework. SAE features also show stronger alignment than PCA with Galaxy Zoo labels. Although challenges in interpretability remain, SAEs provide a powerful engine for discovering astrophysical phenomena beyond the confines of human-defined classification.