🤖 AI Summary
Legacy scientific codebases, such as those written in Fortran, pose significant barriers to integration with modern differentiable frameworks, thereby limiting the application of gradient-based methods in parameter estimation and data assimilation. This work proposes a large language model (LLM)-driven, five-stage agent pipeline that combines static dependency analysis, iterative compilation repair, Fortran reference oracle validation, and JAX automatic differentiation to achieve the first fully automated, high-fidelity, differentiable translation of a large-scale Earth system model component—specifically, the 19,000-line Fortran land surface model CLM-ml-v2. The resulting framework substantially lowers the barrier to modernizing scientific computing models: it enables computation of the full Jacobian via a single backward pass, accelerates parameter inversion by 8× compared to gradient-free optimization, and achieves a 24× speedup over the original Fortran implementation when scaling to 2,048 parallel samples.
📝 Abstract
Differentiable programming offers transformative capabilities for scientific modeling, enabling gradient-based parameter estimation, sensitivity analysis, and data assimilation. Yet, migrating legacy codebases into differentiable frameworks remains a challenge. We present a five-phase LLM-based agentic pipeline that translates legacy Fortran into JAX: static dependency analysis determines module translation order from the full call graph; iterative compile-repair loops correct errors autonomously; and a Fortran reference oracle enforces numerical parity at the module level before integration and gradient verification. We instantiate and evaluate the pipeline on CLM-ml-v2, a 19,000-line Fortran land surface model, and analyze agent behavior across 73 module translation tasks. The resulting differentiable model computes the complete Jacobian in a single backward pass, recovers physical parameters in eight times fewer steps than gradient-free optimization, and achieves a 24 times wall-clock speedup over sequential Fortran at ensemble size N=2,048. Both the translated model and pipeline infrastructure are released as a reusable framework for differentiating other Earth system model components.