AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing remote sensing foundation models for crop mapping are constrained by fixed spatiotemporal windows or neglect of temporal dynamics, limiting their ability to capture multi-scale phenological growth patterns. To address this, we propose AgriRSFM—the first agriculture-oriented, multi-source, time-series remote sensing foundation model. It introduces a modified Video Swin Transformer with spatiotemporal synchronized downsampling for hierarchical joint spatiotemporal feature encoding, and a dynamic fusion decoder that unifies processing of long-term, multi-source time-series data from MODIS, Landsat-8/9, and Sentinel-2. AgriRSFM is pre-trained via self-supervision on over 25 million global samples. On multiple crop mapping benchmarks, it significantly outperforms state-of-the-art methods and general-purpose remote sensing foundation models. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Accurate crop mapping fundamentally relies on modeling multi-scale spatiotemporal patterns, where spatial scales range from individual field textures to landscape-level context, and temporal scales capture both short-term phenological transitions and full growing-season dynamics. Transformer-based remote sensing foundation models (RSFMs) offer promising potential for crop mapping due to their innate ability for unified spatiotemporal processing. However, current RSFMs remain suboptimal for crop mapping: they either employ fixed spatiotemporal windows that ignore the multi-scale nature of crop systems or completely disregard temporal information by focusing solely on spatial patterns. To bridge these gaps, we present AgriFM, a multi-source remote sensing foundation model specifically designed for agricultural crop mapping. Our approach begins by establishing the necessity of simultaneous hierarchical spatiotemporal feature extraction, leading to the development of a modified Video Swin Transformer architecture where temporal down-sampling is synchronized with spatial scaling operations. This modified backbone enables efficient unified processing of long time-series satellite inputs. AgriFM leverages temporally rich data streams from three satellite sources including MODIS, Landsat-8/9 and Sentinel-2, and is pre-trained on a global representative dataset comprising over 25 million image samples supervised by land cover products. The resulting framework incorporates a versatile decoder architecture that dynamically fuses these learned spatiotemporal representations, supporting diverse downstream tasks. Comprehensive evaluations demonstrate AgriFM's superior performance over conventional deep learning approaches and state-of-the-art general-purpose RSFMs across all downstream tasks. Codes will be available at urlhttps://github.com/flyakon/AgriFM.

Problem

Research questions and friction points this paper is trying to address.

Modeling multi-scale spatiotemporal patterns for crop mapping

Overcoming limitations of current remote sensing foundation models

Integrating multi-source satellite data for accurate agricultural analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified Video Swin Transformer for spatiotemporal processing

Multi-source satellite data fusion for crop mapping

Hierarchical spatiotemporal feature extraction architecture

🔎 Similar Papers

No similar papers found.