🤖 AI Summary
Existing remote sensing benchmarks predominantly focus on multi-focus regional classification/segmentation tasks, lacking standardized datasets for pixel-level regression evaluation of global-scale forest aboveground biomass (AGB). To address this gap, we introduce the first continental-scale regression benchmark—spanning all seven continents—that integrates EnMAP hyperspectral imagery with GEDI lidar-derived AGB maps. This benchmark uniquely bridges both geographical coverage and task modality (i.e., regression) deficiencies. We propose a comparative evaluation framework based on Vision Transformers and U-Net architectures, revealing for the first time the critical impact of token patch size on pixel-level regression performance. Experimental results demonstrate that geospatial foundation models (Geo-FMs), after fine-tuning with limited labeled data, achieve performance comparable to or surpassing that of U-Net. The dataset and code will be publicly released to support generalization analysis and fair benchmarking of geospatial foundation models.
📝 Abstract
Comprehensive evaluation of geospatial foundation models (Geo-FMs) requires benchmarking across diverse tasks, sensors, and geographic regions. However, most existing benchmark datasets are limited to segmentation or classification tasks, and focus on specific geographic areas. To address this gap, we introduce a globally distributed dataset for forest aboveground biomass (AGB) estimation, a pixel-wise regression task. This benchmark dataset combines co-located hyperspectral imagery (HSI) from the Environmental Mapping and Analysis Program (EnMAP) satellite and predictions of AGB density estimates derived from the Global Ecosystem Dynamics Investigation lidars, covering seven continental regions. Our experimental results on this dataset demonstrate that the evaluated Geo-FMs can match or, in some cases, surpass the performance of a baseline U-Net, especially when fine-tuning the encoder. We also find that the performance difference between the U-Net and Geo-FMs depends on the dataset size for each region and highlight the importance of the token patch size in the Vision Transformer backbone for accurate predictions in pixel-wise regression tasks. By releasing this globally distributed hyperspectral benchmark dataset, we aim to facilitate the development and evaluation of Geo-FMs for HSI applications. Leveraging this dataset additionally enables research into geographic bias and generalization capacity of Geo-FMs. The dataset and source code will be made publicly available.