🤖 AI Summary
Accurate spatiotemporal monitoring of gross primary productivity (GPP) across large-scale forest ecosystems remains challenging due to limitations in current remote sensing–based approaches. Method: This study proposes a deep learning–based temporal modeling framework leveraging multimodal satellite data—including Sentinel-2 optical, MODIS land surface temperature, Sentinel-1 SAR, and solar radiation—to predict GPP. We systematically compare the predictive performance of Transformer (GPT-2) and LSTM architectures. Results: Both models achieve comparable overall accuracy; however, LSTM attains high precision with shorter input windows (≤30 days), whereas GPT-2 significantly outperforms LSTM under extreme climatic conditions (e.g., droughts and heatwaves). Crucially, this work is the first to elucidate the interplay among model architecture, context length, and multimodal remote sensing data in governing GPP prediction accuracy. The findings establish an interpretable, robust deep learning paradigm for dynamic terrestrial carbon cycle monitoring.
📝 Abstract
Monitoring the spatiotemporal dynamics of forest CO$_2$ uptake (Gross Primary Production, GPP), remains a central challenge in terrestrial ecosystem research. While Eddy Covariance (EC) towers provide high-frequency estimates, their limited spatial coverage constrains large-scale assessments. Remote sensing offers a scalable alternative, yet most approaches rely on single-sensor spectral indices and statistical models that are often unable to capture the complex temporal dynamics of GPP. Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes, but comparative evaluations of state-of-the-art DL models for multimodal GPP prediction remain scarce. Here, we explore the performance of two representative models for predicting GPP: 1) GPT-2, a transformer architecture, and 2) Long Short-Term Memory (LSTM), a recurrent neural network, using multivariate inputs. Overall, both achieve similar accuracy. But, while LSTM performs better overall, GPT-2 excels during extreme events. Analysis of temporal context length further reveals that LSTM attains similar accuracy using substantially shorter input windows than GPT-2, highlighting an accuracy-efficiency trade-off between the two architectures. Feature importance analysis reveals radiation as the dominant predictor, followed by Sentinel-2, MODIS land surface temperature, and Sentinel-1 contributions. Our results demonstrate how model architecture, context length, and multimodal inputs jointly determine performance in GPP prediction, guiding future developments of DL frameworks for monitoring terrestrial carbon dynamics.