Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Over 75 vision foundation models have emerged for remote sensing, yet their cross-task performance is inconsistent, and conventional evaluation requires task-specific fine-tuning—rendering benchmarking costly and inefficient. Method: We propose a novel “capability encoding” paradigm that enables zero-shot performance prediction across diverse downstream tasks without fine-tuning. It learns task-agnostic, low-dimensional capability representations via feature-space alignment and employs a lightweight linear regressor to map these representations to predicted task accuracies. Contribution/Results: Our approach drastically reduces benchmarking overhead and accelerates model selection. For the first time, we construct a capability atlas for remote sensing foundation models—providing an interpretable, transferable analytical framework for systematic evaluation, comparative analysis, and evolutionary tracking of model capabilities.

Technology Category

Application Category

📝 Abstract

Foundation models constitute a significant advancement in computer vision: after a single, albeit costly, training phase, they can address a wide array of tasks. In the field of Earth observation, over 75 remote sensing vision foundation models have been developed in the past four years. However, none has consistently outperformed the others across all available downstream tasks. To facilitate their comparison, we propose a cost-effective method for predicting a model's performance on multiple downstream tasks without the need for fine-tuning on each one. This method is based on what we call"capabilities encoding."The utility of this novel approach is twofold: we demonstrate its potential to simplify the selection of a foundation model for a given new task, and we employ it to offer a fresh perspective on the existing literature, suggesting avenues for future research. Codes are available at https://github.com/pierreadorni/capabilities-encoding.

Problem

Research questions and friction points this paper is trying to address.

Predicting model performance without fine-tuning on tasks

Simplifying foundation model selection for new tasks

Providing new insights on remote sensing foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Capabilities encoding predicts model performance

Avoids fine-tuning on multiple downstream tasks

Simplifies foundation model selection process

🔎 Similar Papers

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring