Large Language Model-Powered Agent for C to Rust Code Translation

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of scarce parallel corpora, ambiguous translation steps, and difficulty in verification during C-to-Rust migration, this paper proposes LAC2R: an LLM-based agent system leveraging Monte Carlo Tree Search (MCTS) for autonomous planning, iterative diagnosis, and refinement. Its core contribution is the Virtual Fuzzy Equivalence Testing (VFT) mechanism—a novel, annotation-free behavioral equivalence verification method that automatically validates functional correctness without human-labeled ground truth. Furthermore, LAC2R systematically integrates the reasoning–diagnosis–refinement closed loop into systems programming language migration for the first time. Evaluated on a large-scale benchmark of real-world C functions, LAC2R achieves substantial improvements in translation accuracy and memory safety: all generated Rust code is 100% compilable, behaviorally equivalent to the original C implementations, and free from undefined behavior and memory-safety violations.

Technology Category

Application Category

📝 Abstract
The C programming language has been foundational in building system-level software. However, its manual memory management model frequently leads to memory safety issues. In response, a modern system programming language, Rust, has emerged as a memory-safe alternative. Moreover, automating the C-to-Rust translation empowered by the rapid advancements of the generative capabilities of LLMs is gaining growing interest for large volumes of legacy C code. Despite some success, existing LLM-based approaches have constrained the role of LLMs to static prompt-response behavior and have not explored their agentic problem-solving capability. Applying the LLM agentic capability for the C-to-Rust translation introduces distinct challenges, as this task differs from the traditional LLM agent applications, such as math or commonsense QA domains. First, the scarcity of parallel C-to-Rust datasets hinders the retrieval of suitable code translation exemplars for in-context learning. Second, unlike math or commonsense QA, the intermediate steps required for C-to-Rust are not well-defined. Third, it remains unclear how to organize and cascade these intermediate steps to construct a correct translation trajectory. To address these challenges in the C-to-Rust translation, we propose a novel intermediate step, the Virtual Fuzzing-based equivalence Test (VFT), and an agentic planning framework, the LLM-powered Agent for C-to-Rust code translation (LAC2R). The VFT guides LLMs to identify input arguments that induce divergent behaviors between an original C function and its Rust counterpart and to generate informative diagnoses to refine the unsafe Rust code. LAC2R uses the MCTS to systematically organize the LLM-induced intermediate steps for correct translation. We experimentally demonstrated that LAC2R effectively conducts C-to-Rust translation on large-scale, real-world benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Addressing memory safety issues in C by translating to Rust
Overcoming lack of parallel C-to-Rust datasets for learning
Organizing undefined intermediate steps for accurate code translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Virtual Fuzzing-based equivalence Test (VFT)
Implements LLM-powered Agent (LAC2R) framework
Applies Monte Carlo Tree Search (MCTS)