🤖 AI Summary
This work addresses the challenge of robust source morphology analysis in next-generation radio astronomical surveys, where heterogeneity across telescopes and imaging pipelines degrades model performance. To this end, the authors propose STRADAViT, a novel framework built upon the Vision Transformer architecture that incorporates a radio-aware view generation mechanism and a two-stage continual self-supervised pretraining strategy. By leveraging multi-survey data—including MeerKAT, ASKAP, LOFAR/LoTSS, and SKA—the method learns a transferable radio image encoder. Evaluated on benchmark datasets such as MiraBest, LoTSS DR2, and Radio Galaxy Zoo, STRADAViT consistently outperforms existing baselines under both linear probing and fine-tuning protocols, achieving notably higher Macro-F1 scores on RGZ DR1 while requiring lower computational cost.
📝 Abstract
Next-generation radio astronomy surveys are producing millions of resolved sources, but robust morphology analysis remains difficult across heterogeneous telescopes and imaging pipelines. We present STRADAViT, a self-supervised Vision Transformer continued-pretraining framework for transferable radio astronomy image encoders. STRADAViT combines a mixed-survey pretraining dataset, radio astronomy-aware view generation, and controlled continued pretraining through reconstruction-only, contrastive-only, and two-stage branches. Pretraining uses 512x512 radio astronomy cutouts from MeerKAT, ASKAP, LOFAR/LoTSS, and SKA data. We evaluate transfer with linear probing and fine-tuning on three morphology benchmarks: MiraBest, LoTSS DR2, and Radio Galaxy Zoo. Relative to the initialization used for continued pretraining, the best two-stage STRADAViT models improve Macro-F1 in all reported linear-probe settings and in most fine-tuning settings, with the largest gain on RGZ DR1. Relative to strong DINOv2 baselines, gains are selective but remain positive on LoTSS DR2 and RGZ DR1 under linear probing, and on MiraBest and RGZ DR1 under fine-tuning. A targeted DINOv2-initialized HCL ablation further shows that the adaptation recipe is not specific to a single starting point. The released STRADAViT checkpoint remains the preferred model because it offers competitive transfer at lower token count and downstream cost than the DINOv2-based alternative. These results show that radio astronomy-aware view generation and staged continued pretraining provide a stronger starting point than out-of-the-box Vision Transformers for radio astronomy transfer.