🤖 AI Summary
This study addresses two key challenges in soil organic carbon (SOC) cycling prediction: the poor interpretability of data-driven models and the difficulty in identifying parameters for mechanistic models. To bridge this gap, we propose a physics-informed neural network—Biogeochemical-Informed Neural Network (BINN)—that vectorizes and deeply embeds the CLM5 soil carbon cycle model into a neural architecture, thereby unifying data-driven learning with biogeochemical constraints. Our method substantially improves identifiability of critical biogeochemical parameters and reliability of spatial process inversion. Evaluated on 25,925 field-measured soil profiles across the contiguous United States, BINN successfully inverts six major carbon processes, achieving an average correlation coefficient of 0.81 with the PRODA benchmark. Computational efficiency increases by over 50×, and parameter recovery accuracy is significantly enhanced. This work establishes the first end-to-end coupling paradigm between process-based models and deep learning, offering a novel pathway toward interpretable, physics-grounded Earth system modeling.
📝 Abstract
Big data and the rapid development of artificial intelligence (AI) provide unprecedented opportunities to enhance our understanding of the global carbon cycle and other biogeochemical processes. However, retrieving mechanistic knowledge from big data remains a challenge. Here, we develop a Biogeochemistry-Informed Neural Network (BINN) that seamlessly integrates a vectorized process-based soil carbon cycle model (i.e., Community Land Model version 5, CLM5) into a neural network (NN) structure to examine mechanisms governing soil organic carbon (SOC) storage from big data. BINN demonstrates high accuracy in retrieving biogeochemical parameter values from synthetic data in a parameter recovery experiment. We use BINN to predict six major processes regulating the soil carbon cycle (or components in process-based models) from 25,925 observed SOC profiles across the conterminous US and compared them with the same processes previously retrieved by a Bayesian inference-based PROcess-guided deep learning and DAta-driven modeling (PRODA) approach (Tao et al. 2020; 2023). The high agreement between the spatial patterns of the retrieved processes using the two approaches with an average correlation coefficient of 0.81 confirms BINN's ability in retrieving mechanistic knowledge from big data. Additionally, the integration of neural networks and process-based models in BINN improves computational efficiency by more than 50 times over PRODA. We conclude that BINN is a transformative tool that harnesses the power of both AI and process-based modeling, facilitation new scientific discoveries while improving interpretability and accuracy of Earth system models.