Obfuscation-Resilient Binary Code Similarity Analysis using Dominance Enhanced Semantic Graph

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Control-flow instability under code obfuscation degrades binary function semantic similarity analysis (BCSA) performance. To address this, we propose the Dominator-Enhanced Semantic Graph (DESG), a novel graph representation that eschews fragile control-flow structures and instead integrates multi-granular semantics—from individual instructions to basic blocks—augmented with program dominator relationships. We further design ORCAS, an obfuscation-invariant semantic similarity measurement framework that leverages contrastive learning for robust matching of heterogeneous binary functions. Evaluated on the BinKit dataset, our method improves PR-AUC by 12.1%; on a newly constructed real-world obfuscated vulnerability dataset, it achieves up to a 43% gain in recall. Additionally, we publicly release this new benchmark dataset to advance the BCSA research community.

Technology Category

Application Category

📝 Abstract
Binary code similarity analysis (BCSA) serves as a core technique for binary analysis tasks such as vulnerability detection. While current graph-based BCSA approaches capture substantial semantics and show strong performance, their performance suffers under code obfuscation due to the unstable control flow. To address this issue, we develop ORCAS, an Obfuscation-Resilient BCSA model based on Dominance Enhanced Semantic Graph (DESG). The DESG is an original binary code representation, capturing more binaries' implicit semantics without control flow structure, including inter-instruction relations, inter-basic block relations, and instruction-basic block relations. ORCAS robustly scores semantic similarity across binary functions from different obfuscation options, optimization levels, and instruction set architectures. Extensive evaluation on the BinKit dataset shows ORCAS significantly outperforms eight baselines, achieving an average 12.1% PR-AUC gain when using combined three obfuscation options compared to the state-of-the-art approaches. Furthermore, ORCAS improves recall by up to 43% on an original obfuscated real-world vulnerability dataset, which we released to facilitate future research.
Problem

Research questions and friction points this paper is trying to address.

Enhance binary code similarity analysis under obfuscation
Capture implicit semantics without control flow structure
Improve robustness across obfuscation, optimization, and architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dominance Enhanced Semantic Graph for binary representation
Obfuscation-Resilient BCSA model named ORCAS
Robust similarity scoring across diverse binary functions
🔎 Similar Papers
No similar papers found.
Y
Yufeng Wang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Yuhong Feng
Yuhong Feng
Associate Professor
Workflow ManagementCloud ComputingThe Internet of thingsLinux Operating System
Yixuan Cao
Yixuan Cao
Shenzhen University
Software EngineeringSecurityKernel & CompilerTesting & VerificationBig Data
H
Haoran Li
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Haiyue Feng
Haiyue Feng
College of Computer Science & Software Engineering, Shenzhen University
Large Language Models
Y
Yifeng Wang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China