🤖 AI Summary
Deep learning models for global Earth observation often suffer performance degradation due to distributional shift—particularly in data-scarce regions. Existing out-of-distribution (OOD) detection methods either require access to OOD samples during training or compromise in-distribution (ID) task accuracy, limiting practical deployment. This paper proposes TARDIS: a post-hoc, sample-free OOD detection framework that requires no OOD data and incurs zero ID task performance loss. Its core innovation lies in automatically clustering and modeling similarity within the ID feature space of a pre-trained model to generate high-quality pseudo-ID/pseudo-OOD labels, which train a lightweight binary classifier. Evaluated across 17 covariate and semantic shift scenarios on EuroSAT and xBD, TARDIS achieves the pseudo-label upper bound in 13 cases and matches state-of-the-art activation- and scoring-based methods in detection accuracy. It has been successfully scaled to the Fields of the World dataset, demonstrating real-world viability.
📝 Abstract
Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}