AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping

πŸ“… 2025-05-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing remote sensing foundation models for crop mapping are constrained by fixed spatiotemporal windows or neglect of temporal dynamics, limiting their ability to capture multi-scale phenological growth patterns. To address this, we propose AgriRSFMβ€”the first agriculture-oriented, multi-source, time-series remote sensing foundation model. It introduces a modified Video Swin Transformer with spatiotemporal synchronized downsampling for hierarchical joint spatiotemporal feature encoding, and a dynamic fusion decoder that unifies processing of long-term, multi-source time-series data from MODIS, Landsat-8/9, and Sentinel-2. AgriRSFM is pre-trained via self-supervision on over 25 million global samples. On multiple crop mapping benchmarks, it significantly outperforms state-of-the-art methods and general-purpose remote sensing foundation models. The code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Accurate crop mapping fundamentally relies on modeling multi-scale spatiotemporal patterns, where spatial scales range from individual field textures to landscape-level context, and temporal scales capture both short-term phenological transitions and full growing-season dynamics. Transformer-based remote sensing foundation models (RSFMs) offer promising potential for crop mapping due to their innate ability for unified spatiotemporal processing. However, current RSFMs remain suboptimal for crop mapping: they either employ fixed spatiotemporal windows that ignore the multi-scale nature of crop systems or completely disregard temporal information by focusing solely on spatial patterns. To bridge these gaps, we present AgriFM, a multi-source remote sensing foundation model specifically designed for agricultural crop mapping. Our approach begins by establishing the necessity of simultaneous hierarchical spatiotemporal feature extraction, leading to the development of a modified Video Swin Transformer architecture where temporal down-sampling is synchronized with spatial scaling operations. This modified backbone enables efficient unified processing of long time-series satellite inputs. AgriFM leverages temporally rich data streams from three satellite sources including MODIS, Landsat-8/9 and Sentinel-2, and is pre-trained on a global representative dataset comprising over 25 million image samples supervised by land cover products. The resulting framework incorporates a versatile decoder architecture that dynamically fuses these learned spatiotemporal representations, supporting diverse downstream tasks. Comprehensive evaluations demonstrate AgriFM's superior performance over conventional deep learning approaches and state-of-the-art general-purpose RSFMs across all downstream tasks. Codes will be available at urlhttps://github.com/flyakon/AgriFM.
Problem

Research questions and friction points this paper is trying to address.

Modeling multi-scale spatiotemporal patterns for crop mapping
Overcoming limitations of current remote sensing foundation models
Integrating multi-source satellite data for accurate agricultural analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified Video Swin Transformer for spatiotemporal processing
Multi-source satellite data fusion for crop mapping
Hierarchical spatiotemporal feature extraction architecture
πŸ”Ž Similar Papers
No similar papers found.
W
Wenyuan Li
Jockey Club STEM Lab of Quantitative Remote Sensing, Department of Geography, The University of Hong Kong, Hong Kong, China
Shunlin Liang
Shunlin Liang
Chair Professor, Department of Geography, The University of Hong Kong
K
Keyan Chen
Department of Aerospace Intelligent Science and Technology, School of Astronautics, Beihang University, Beijing, China
Y
Yongzhe Chen
Jockey Club STEM Lab of Quantitative Remote Sensing, Department of Geography, The University of Hong Kong, Hong Kong, China
Han Ma
Han Ma
The Chinese university of hong kong
RoboticsPath Planning
J
Jianglei Xu
Jockey Club STEM Lab of Quantitative Remote Sensing, Department of Geography, The University of Hong Kong, Hong Kong, China
Yichuan Ma
Yichuan Ma
Fudan University
LLMSynthetic Data
S
Shikang Guan
Jockey Club STEM Lab of Quantitative Remote Sensing, Department of Geography, The University of Hong Kong, Hong Kong, China
H
Husheng Fang
School of Remote Sensing and Information Engineering, Wuhan University, China
Z
Zhenwei Shi
Department of Aerospace Intelligent Science and Technology, School of Astronautics, Beihang University, Beijing, China