🤖 AI Summary
Remote sensing foundation models often suffer from modality mismatch between unimodal pretraining and downstream multimodal tasks, while full-model fine-tuning incurs high computational cost and poor generalization in few-shot scenarios. To address this, we propose MAPEX—a novel framework built upon a Mixture-of-Experts (MoE) architecture that introduces modality-aware expert pruning and modality-conditioned token routing. These mechanisms enable on-demand activation of sparse, task-adaptive subnetworks tailored to input modalities. Leveraging multimodal remote sensing data for pretraining and dynamic expert selection, MAPEX effectively alleviates modality mismatch. Extensive experiments across multiple remote sensing benchmarks demonstrate that MAPEX outperforms both fully supervised methods and existing foundation models—despite using 37%–62% fewer parameters—achieving superior fine-tuning efficiency and task-specific adaptability.
📝 Abstract
Remote sensing data is commonly used for tasks such as flood mapping, wildfire detection, or land-use studies. For each task, scientists carefully choose appropriate modalities or leverage data from purpose-built instruments. Recent work on remote sensing foundation models pre-trains computer vision models on large amounts of remote sensing data. These large-scale models tend to focus on specific modalities, often optical RGB or multispectral data. For many important applications, this introduces a mismatch between the application modalities and the pre-training data. Moreover, the large size of foundation models makes them expensive and difficult to fine-tune on typically small datasets for each task. We address this mismatch with MAPEX, a remote sensing foundation model based on mixture-of-modality experts. MAPEX is pre-trained on multi-modal remote sensing data with a novel modality-conditioned token routing mechanism that elicits modality-specific experts. To apply the model on a specific task, we propose a modality aware pruning technique, which only retains experts specialized for the task modalities. This yields efficient modality-specific models while simplifying fine-tuning and deployment for the modalities of interest. We experimentally validate MAPEX on diverse remote sensing datasets and show strong performance compared to fully supervised training and state-of-the-art remote sensing foundation models. Code is available at https://github.com/HSG-AIML/MAPEX.