Systematic LLM Translation of Legacy Scientific Code to Differentiable Frameworks: Application to a Land Surface Model

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Legacy scientific codebases, such as those written in Fortran, pose significant barriers to integration with modern differentiable frameworks, thereby limiting the application of gradient-based methods in parameter estimation and data assimilation. This work proposes a large language model (LLM)-driven, five-stage agent pipeline that combines static dependency analysis, iterative compilation repair, Fortran reference oracle validation, and JAX automatic differentiation to achieve the first fully automated, high-fidelity, differentiable translation of a large-scale Earth system model component—specifically, the 19,000-line Fortran land surface model CLM-ml-v2. The resulting framework substantially lowers the barrier to modernizing scientific computing models: it enables computation of the full Jacobian via a single backward pass, accelerates parameter inversion by 8× compared to gradient-free optimization, and achieves a 24× speedup over the original Fortran implementation when scaling to 2,048 parallel samples.
📝 Abstract
Differentiable programming offers transformative capabilities for scientific modeling, enabling gradient-based parameter estimation, sensitivity analysis, and data assimilation. Yet, migrating legacy codebases into differentiable frameworks remains a challenge. We present a five-phase LLM-based agentic pipeline that translates legacy Fortran into JAX: static dependency analysis determines module translation order from the full call graph; iterative compile-repair loops correct errors autonomously; and a Fortran reference oracle enforces numerical parity at the module level before integration and gradient verification. We instantiate and evaluate the pipeline on CLM-ml-v2, a 19,000-line Fortran land surface model, and analyze agent behavior across 73 module translation tasks. The resulting differentiable model computes the complete Jacobian in a single backward pass, recovers physical parameters in eight times fewer steps than gradient-free optimization, and achieves a 24 times wall-clock speedup over sequential Fortran at ensemble size N=2,048. Both the translated model and pipeline infrastructure are released as a reusable framework for differentiating other Earth system model components.
Problem

Research questions and friction points this paper is trying to address.

differentiable programming
legacy scientific code
code translation
land surface model
gradient-based optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentiable programming
LLM-based translation
legacy Fortran migration
JAX
scientific code modernization
🔎 Similar Papers
2024-03-252024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge) Conference Acronym:Citations: 22