CraterBench-R: Instance-Level Crater Retrieval for Planetary Scale

📅 2026-04-05

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work extends crater analysis beyond detection to instance-level image retrieval, enabling scientific tasks such as deduplication, cross-observation matching, and morphological analogy. To this end, the authors introduce CraterBench-R, a multi-scale retrieval benchmark comprising approximately 25,000 crater identities, and propose a training-free instance token aggregation method. Leveraging a domain-pretrained self-supervised Vision Transformer, their approach combines multi-token late-interaction matching with cosine-similarity-driven token clustering, achieving high retrieval accuracy while substantially reducing storage overhead. The resulting two-stage efficient retrieval pipeline improves mAP by 17.9 points at K=16 and matches the performance of using all 196 tokens at K=64, recovering 89–94% of full retrieval capability with significantly fewer resources.

Technology Category

Application Category

📝 Abstract

Impact craters are a cornerstone of planetary surface analysis. However, while most deep learning pipelines treat craters solely as a detection problem, critical scientific workflows such as catalog deduplication, cross-observation matching, and morphological analog discovery are inherently retrieval tasks. To address this, we formulate crater analysis as an instance-level image retrieval problem and introduce CraterBench-R, a curated benchmark featuring about 25,000 crater identities with multi-scale gallery views and manually verified queries spanning diverse scales and contexts. Our baseline evaluations across various architectures reveal that self-supervised Vision Transformers (ViTs), particularly those with in-domain pretraining, dominate the task, outperforming generic models with significantly more parameters. Furthermore, we demonstrate that retaining multiple ViT patch tokens for late-interaction matching dramatically improves accuracy over standard single-vector pooling. However, storing all tokens per image is operationally inefficient at a planetary scale. To close this efficiency gap, we propose instance-token aggregation, a scalable, training-free method that selects K seed tokens, assigns the remaining tokens to these seeds via cosine similarity, and aggregates each cluster into a single representative token. This approach yields substantial gains: at K=16, aggregation improves mAP by 17.9 points over raw token selection, and at K=64, it matches the accuracy of using all 196 tokens with significantly less storage. Finally, we demonstrate that a practical two-stage pipeline, with single-vector shortlisting followed by instance-token reranking, recovers 89-94% of the full late-interaction accuracy while searching only a small candidate set. The benchmark is publicly available at hf.co/datasets/jfang/CraterBench-R.

Problem

Research questions and friction points this paper is trying to address.

crater retrieval

instance-level retrieval

planetary surface analysis

image retrieval benchmark

impact craters

Innovation

Methods, ideas, or system contributions that make the work stand out.

instance-level retrieval

Vision Transformers

token aggregation