Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Accurate spatiotemporal monitoring of gross primary productivity (GPP) across large-scale forest ecosystems remains challenging due to limitations in current remote sensing–based approaches. Method: This study proposes a deep learning–based temporal modeling framework leveraging multimodal satellite data—including Sentinel-2 optical, MODIS land surface temperature, Sentinel-1 SAR, and solar radiation—to predict GPP. We systematically compare the predictive performance of Transformer (GPT-2) and LSTM architectures. Results: Both models achieve comparable overall accuracy; however, LSTM attains high precision with shorter input windows (≤30 days), whereas GPT-2 significantly outperforms LSTM under extreme climatic conditions (e.g., droughts and heatwaves). Crucially, this work is the first to elucidate the interplay among model architecture, context length, and multimodal remote sensing data in governing GPP prediction accuracy. The findings establish an interpretable, robust deep learning paradigm for dynamic terrestrial carbon cycle monitoring.

Technology Category

Application Category

📝 Abstract

Monitoring the spatiotemporal dynamics of forest CO$_2$ uptake (Gross Primary Production, GPP), remains a central challenge in terrestrial ecosystem research. While Eddy Covariance (EC) towers provide high-frequency estimates, their limited spatial coverage constrains large-scale assessments. Remote sensing offers a scalable alternative, yet most approaches rely on single-sensor spectral indices and statistical models that are often unable to capture the complex temporal dynamics of GPP. Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes, but comparative evaluations of state-of-the-art DL models for multimodal GPP prediction remain scarce. Here, we explore the performance of two representative models for predicting GPP: 1) GPT-2, a transformer architecture, and 2) Long Short-Term Memory (LSTM), a recurrent neural network, using multivariate inputs. Overall, both achieve similar accuracy. But, while LSTM performs better overall, GPT-2 excels during extreme events. Analysis of temporal context length further reveals that LSTM attains similar accuracy using substantially shorter input windows than GPT-2, highlighting an accuracy-efficiency trade-off between the two architectures. Feature importance analysis reveals radiation as the dominant predictor, followed by Sentinel-2, MODIS land surface temperature, and Sentinel-1 contributions. Our results demonstrate how model architecture, context length, and multimodal inputs jointly determine performance in GPP prediction, guiding future developments of DL frameworks for monitoring terrestrial carbon dynamics.

Problem

Research questions and friction points this paper is trying to address.

Compare transformer and recurrent models for forest carbon uptake prediction

Address limited spatial coverage of ground-based CO2 monitoring systems

Evaluate multimodal deep learning approaches for vegetation temporal dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer and LSTM models predict forest GPP

Multimodal remote sensing data inputs used

Model performance varies with temporal context length

🔎 Similar Papers

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

2024-06-07arXiv.orgCitations: 0

Authors to Follow