Fortran2CPP: Automating Fortran-to-C++ Migration using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

The migration of scientific computing code from Fortran to C++ suffers from a scarcity of high-quality, domain-specific training data for large language models (LLMs). Method: This paper proposes a dual-agent LLM collaboration framework integrating an iterative compile-execute-repair loop to generate functionally correct Fortran-to-C++ translation dialogues. Leveraging open-source LLMs, it incorporates compiler feedback-driven code repair and cross-platform executability validation, with rigorous evaluation via CodeBLEU. Contribution/Results: We construct the largest publicly available, functionally verified Fortran2CPP multi-turn dialogue dataset to date. Fine-tuned models achieve a 3.31× improvement in CodeBLEU score and a 92% increase in compilation success rate on independent benchmarks. All data, models, and code are fully open-sourced, establishing the first high-fidelity, verifiable benchmark and methodological paradigm specifically for Fortran-to-C++ translation.

Technology Category

Application Category

📝 Abstract

Migrating Fortran code to C++ is a common task for many scientific computing teams, driven by the need to leverage modern programming paradigms, enhance cross-platform compatibility, and improve maintainability. Automating this translation process using large language models (LLMs) has shown promise, but the lack of high-quality, specialized datasets has hindered their effectiveness. In this paper, we address this challenge by introducing a novel multi-turn dialogue dataset, Fortran2CPP, specifically designed for Fortran-to-C++ code migration. Our dataset, significantly larger than existing alternatives, is generated using a unique LLM-driven, dual-agent pipeline incorporating iterative compilation, execution, and code repair to ensure high quality and functional correctness. To demonstrate the effectiveness of our dataset, we fine-tuned several open-weight LLMs on Fortran2CPP and evaluated their performance on two independent benchmarks. Fine-tuning on our dataset led to remarkable gains, with models achieving up to a 3.31x increase in CodeBLEU score and a 92% improvement in compilation success rate. This highlights the dataset's ability to enhance both the syntactic accuracy and compilability of the translated C++ code. Our dataset and model have been open-sourced and are available on our public GitHub repositoryfootnote{url{https://github.com/HPC-Fortran2CPP/Fortran2Cpp}}.

Problem

Research questions and friction points this paper is trying to address.

Fortran-to-C++

code conversion

language model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fortran2CPP

Automatic Code Conversion

Dual-Model Approach

🔎 Similar Papers

Collaboration Dynamics and Reliability Challenges of Multi-Agent LLM Systems in Finite Element Analysis