Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in Solidity smart contract decompilation—including inaccurate method boundary detection, erroneous variable type recovery, and missing contract attribute reconstruction—this paper proposes SmartHalo, an enhanced framework integrating static analysis with large language models (LLMs). Its core contributions are: (1) the first introduction of a dependency graph (DG) to model fine-grained semantic dependencies; (2) the novel integration of symbolic execution and formal verification to ensure LLM output reliability; and (3) synergistic optimization between static analysis precision and LLM-based semantic understanding. Evaluated on 465 contract methods, SmartHalo significantly outperforms state-of-the-art approaches: it achieves 87.39% accuracy in method boundary identification, 90.39% success rate in variable type recovery, and 80.65% fidelity in contract attribute reconstruction.

Technology Category

Application Category

📝 Abstract
Decompiler is a specialized type of reverse engineering tool extensively employed in program analysis tasks, particularly in program comprehension and vulnerability detection. However, current Solidity smart contract decompilers face significant limitations in reconstructing the original source code. In particular, the bottleneck of SOTA decompilers lies in inaccurate method identification, incorrect variable type recovery, and missing contract attributes. These deficiencies hinder downstream tasks and understanding of the program logic. To address these challenges, we propose SmartHalo, a new framework that enhances decompiler output by combining static analysis (SA) and large language models (LLM). SmartHalo leverages the complementary strengths of SA's accuracy in control and data flow analysis and LLM's capability in semantic prediction. More specifically, system{} constructs a new data structure - Dependency Graph (DG), to extract semantic dependencies via static analysis. Then, it takes DG to create prompts for LLM optimization. Finally, the correctness of LLM outputs is validated through symbolic execution and formal verification. Evaluation on a dataset consisting of 465 randomly selected smart contract methods shows that SmartHalo significantly improves the quality of the decompiled code, compared to SOTA decompilers (e.g., Gigahorse). Notably, integrating GPT-4o with SmartHalo further enhances its performance, achieving precision rates of 87.39% for method boundaries, 90.39% for variable types, and 80.65% for contract attributes.
Problem

Research questions and friction points this paper is trying to address.

Smart Contract Decompilation
Method Identification
Variable Type Determination
Innovation

Methods, ideas, or system contributions that make the work stand out.

SmartHalo
Dependency Graph Optimization
GPT-4o Integration
🔎 Similar Papers
No similar papers found.
Z
Zeqin Liao
Sun Yat-sen University, China
Yuhong Nan
Yuhong Nan
Sun Yat-sen University
System SecurityPrivacy Protection
Z
Zixu Gao
Sun Yat-sen University, China
H
Henglong Liang
Sun Yat-sen University, China
Sicheng Hao
Sicheng Hao
Sun Yat-sen University, China
P
Peifan Reng
Sun Yat-sen University, China
Zibin Zheng
Zibin Zheng
IEEE Fellow, Highly Cited Researcher, Sun Yat-sen University, China
BlockchainSmart ContractServices ComputingSoftware Reliability