Precise Static Identification of Ethereum Storage Variables

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Precise identification of dynamic storage structures (e.g., mappings, arrays) in Ethereum smart contract deployment bytecode remains challenging due to the absence of high-level type information and compiler-erased layout metadata. Method: This paper proposes a deep static analysis framework integrating abstract interpretation and symbolic execution to reconstruct storage layouts and infer types directly from bytecode—enabling, for the first time, storage modeling that exceeds the completeness of compiler-generated storage descriptors. Contribution/Results: Our approach achieves 98.6% precision and ≥92.6% recall under source-code–absent conditions, outperforming state-of-the-art tools by 17.8% and 24.4% respectively. It scales to all mainstream contract sizes and accurately resolves deeply nested and composite data structures. By delivering high-fidelity, complete storage semantics, the method establishes a robust foundation for bytecode-level reverse engineering, security auditing, and formal verification of Ethereum smart contracts.

Technology Category

Application Category

📝 Abstract

Smart contracts are small programs that run autonomously on the blockchain, using it as their persistent memory. The predominant platform for smart contracts is the Ethereum VM (EVM). In EVM smart contracts, a problem with significant applications is to identify data structures (in blockchain state, a.k.a."storage"), given only the deployed smart contract code. The problem has been highly challenging and has often been considered nearly impossible to address satisfactorily. (For reference, the latest state-of-the-art research tool fails to recover nearly all complex data structures and scales to under 50% of contracts.) Much of the complication is that the main on-chain data structures (mappings and arrays) have their locations derived dynamically through code execution. We propose sophisticated static analysis techniques to solve the identification of on-chain data structures with extremely high fidelity and completeness. Our analysis scales nearly universally and recovers deep data structures. Our techniques are able to identify the exact types of data structures with 98.6% precision and at least 92.6% recall, compared to a state-of-the-art tool managing 80.8% and 68.2% respectively. Strikingly, the analysis is often more complete than the storage description that the compiler itself produces, with full access to the source code.

Problem

Research questions and friction points this paper is trying to address.

Identify Ethereum smart contract storage variables statically

Recover complex on-chain data structures with high accuracy

Improve precision and recall beyond current state-of-the-art tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sophisticated static analysis for Ethereum storage

High fidelity identification of data structures

Superior precision and recall compared to SOTA

🔎 Similar Papers

No similar papers found.