🤖 AI Summary
Symbolic execution at the machine-code level faces a fundamental trade-off between state explosion and bit-level precision, particularly for RISC-V binaries.
Method: This paper introduces the first scalable, bit-precise symbolic execution framework for RISC-V machine code. Its core innovation is the complete offloading of control- and data-flow analysis to the SMT solver, enabling bounded model checking directly at the machine-code level. It further integrates two generalized Binary Decision Diagram (BDD) structures—CFLOBDDs and ADDs—leveraging SMT semantics for bit-vectors and arrays to perform efficient forward propagation over input domains, thereby drastically reducing SMT solver invocations.
Contribution/Results: Experiments demonstrate that CFLOBDDs outperform ADDs in scalability and significantly accelerate constraint solving. This work establishes the feasibility and practicality of bit-precise analysis for real-world RISC-V binaries, offering a novel paradigm for high-assurance software analysis.
📝 Abstract
Symbolic execution is a powerful technique for analyzing the behavior of software yet scalability remains a challenge due to state explosion in control and data flow. Existing tools typically aim at managing control flow internally, often at the expense of completeness, while offloading reasoning over data flow to SMT solvers. Moreover, reasoning typically happens on source code or intermediate representation level to leverage structural information, making machine code generation part of the trust base. We are interested in changing the equation in two non-trivial ways: pushing reasoning down to machine code level, and then offloading reasoning entirely into SMT solvers and other, possibly more efficient solver technology. In more abstract terms, we are asking if bit-precise reasoning technology can be made scalable on software, and not just hardware. For this purpose, we developed two tools called rotor and bitme for model generation and bounded model checking, respectively. We chose RISC-V restricted to integer arithmetic as modeling target for rotor since RISC-V integer semantics is essentially equivalent to established SMT semantics over bitvectors and arrays of bitvectors. While state-of-the-art SMT solvers struggle in our experiments, we have evidence that there is potential for improvement. To show the potential, we have slightly generalized and then implemented in bitme two types of binary decision diagrams (BDDs): algebraic decision diagrams (ADDs) and context-free-language ordered binary decision diagrams (CFLOBDDs). Bitme uses BDDs to propagate program input through models, essentially generalizing constant propagation to domain propagation. SMT solvers only get involved when model input cannot be propagated, significanly speeding up SMT solving. We then study the impact on state explosion of CFLOBDDs, which are potentially more scalable than ADDs.