ReF Decompile: Relabeling and Function Call Enhanced Decompile

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing end-to-end decompilation methods struggle to accurately recover control-flow structures and variable semantics, limiting logical reconstruction fidelity. This paper proposes an end-to-end large language model framework tailored for binary decompilation. Its core contributions are: (1) a *Relabelling* strategy that replaces jump addresses with semantic labels to explicitly model control-flow graphs; and (2) a *Function Call* strategy that infers variable types and restores missing symbolic information via call-site context. The framework tightly integrates instruction-level relabelling, function-call modeling, and binary symbol extraction. Evaluated on the Humaneval-Decompile benchmark, it achieves 61.43% functional correctness—significantly surpassing prior state-of-the-art methods. The approach robustly supports downstream security tasks, including vulnerability discovery, malware analysis, and legacy system migration.

Technology Category

Application Category

📝 Abstract
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages, enabling analysis in scenarios where source code is unavailable. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration. The end-to-end decompile method based on large langauge models (LLMs) reduces reliance on additional tools and minimizes manual intervention due to its inherent properties. However, previous end-to-end methods often lose critical information necessary for reconstructing control flow structures and variables when processing binary files, making it challenging to accurately recover the program's logic. To address these issues, we propose the extbf{ReF Decompile} method, which incorporates the following innovations: (1) The Relabelling strategy replaces jump target addresses with labels, preserving control flow clarity. (2) The Function Call strategy infers variable types and retrieves missing variable information from binary files. Experimental results on the Humaneval-Decompile Benchmark demonstrate that ReF Decompile surpasses comparable baselines and achieves state-of-the-art (SOTA) performance of $61.43%$.
Problem

Research questions and friction points this paper is trying to address.

Decompilation of low-level code
Preserving control flow clarity
Inferring variable types accurately
Innovation

Methods, ideas, or system contributions that make the work stand out.

Relabeling strategy enhances control flow clarity
Function Call strategy infers variable types
Achieves state-of-the-art decompilation performance
🔎 Similar Papers
No similar papers found.