Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate spatiotemporal monitoring of gross primary productivity (GPP) across large-scale forest ecosystems remains challenging due to limitations in current remote sensing–based approaches. Method: This study proposes a deep learning–based temporal modeling framework leveraging multimodal satellite data—including Sentinel-2 optical, MODIS land surface temperature, Sentinel-1 SAR, and solar radiation—to predict GPP. We systematically compare the predictive performance of Transformer (GPT-2) and LSTM architectures. Results: Both models achieve comparable overall accuracy; however, LSTM attains high precision with shorter input windows (≤30 days), whereas GPT-2 significantly outperforms LSTM under extreme climatic conditions (e.g., droughts and heatwaves). Crucially, this work is the first to elucidate the interplay among model architecture, context length, and multimodal remote sensing data in governing GPP prediction accuracy. The findings establish an interpretable, robust deep learning paradigm for dynamic terrestrial carbon cycle monitoring.

Technology Category

Application Category

📝 Abstract
Monitoring the spatiotemporal dynamics of forest CO$_2$ uptake (Gross Primary Production, GPP), remains a central challenge in terrestrial ecosystem research. While Eddy Covariance (EC) towers provide high-frequency estimates, their limited spatial coverage constrains large-scale assessments. Remote sensing offers a scalable alternative, yet most approaches rely on single-sensor spectral indices and statistical models that are often unable to capture the complex temporal dynamics of GPP. Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes, but comparative evaluations of state-of-the-art DL models for multimodal GPP prediction remain scarce. Here, we explore the performance of two representative models for predicting GPP: 1) GPT-2, a transformer architecture, and 2) Long Short-Term Memory (LSTM), a recurrent neural network, using multivariate inputs. Overall, both achieve similar accuracy. But, while LSTM performs better overall, GPT-2 excels during extreme events. Analysis of temporal context length further reveals that LSTM attains similar accuracy using substantially shorter input windows than GPT-2, highlighting an accuracy-efficiency trade-off between the two architectures. Feature importance analysis reveals radiation as the dominant predictor, followed by Sentinel-2, MODIS land surface temperature, and Sentinel-1 contributions. Our results demonstrate how model architecture, context length, and multimodal inputs jointly determine performance in GPP prediction, guiding future developments of DL frameworks for monitoring terrestrial carbon dynamics.
Problem

Research questions and friction points this paper is trying to address.

Compare transformer and recurrent models for forest carbon uptake prediction
Address limited spatial coverage of ground-based CO2 monitoring systems
Evaluate multimodal deep learning approaches for vegetation temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer and LSTM models predict forest GPP
Multimodal remote sensing data inputs used
Model performance varies with temporal context length
🔎 Similar Papers
David Montero
David Montero
Leipzig University, Institute for Earth System Science and Remote Sensing
Remote SensingMachine LearningGeomaticsData ScienceGoogle Earth Engine
Miguel D. Mahecha
Miguel D. Mahecha
IEF, Leipzig University
F
Francesco Martinuzzi
MPI PKS
C
César Aybar
IPL, Universitat de València
A
Anne Klosterhalfen
Bioclimatology, University of Göttingen
Alexander Knohl
Alexander Knohl
Bioclimatology, University of Göttingen
J
Jesús Anaya
GEMA, Universidad de Medellín
C
Clemens Mosig
IEF, Leipzig University
S
Sebastian Wieneke
IEF, Leipzig University