ReGraph: A Tool for Binary Similarity Identification

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Binary code similarity detection (BCSD) faces challenges including cross-architecture and cross-optimization-level function matching, as well as high computational overhead in existing deep learning approaches. This paper proposes a lightweight graph-based representation framework that eschews complex neural networks, instead leveraging abstracted control-flow graphs (CFGs) and efficient graph edit distance (GED) for structural matching—requiring no GPU acceleration. The method achieves state-of-the-art (SOTA) accuracy while accelerating inference by 700× compared to leading deep learning baselines, attaining competitive accuracy on mainstream public benchmarks. Its core contribution is the first application of scalable, efficient GED computation to large-scale BCSD, achieving an unprecedented balance between accuracy and efficiency. This breakthrough significantly enhances scalability for industrial-grade binary analysis, enabling practical deployment in resource-constrained environments.

Technology Category

Application Category

📝 Abstract
Binary Code Similarity Detection (BCSD) is not only essential for security tasks such as vulnerability identification but also for code copying detection, yet it remains challenging due to binary stripping and diverse compilation environments. Existing methods tend to adopt increasingly complex neural networks for better accuracy performance. The computation time increases with the complexity. Even with powerful GPUs, the treatment of large-scale software becomes time-consuming. To address these issues, we present a framework called ReGraph to efficiently compare binary code functions across architectures and optimization levels. Our evaluation with public datasets highlights that ReGraph exhibits a significant speed advantage, performing 700 times faster than Natural Language Processing (NLP)-based methods while maintaining comparable accuracy results with respect to the state-of-the-art models.
Problem

Research questions and friction points this paper is trying to address.

Detect binary code similarity across architectures efficiently
Address slow computation in large-scale binary analysis
Balance speed and accuracy in binary similarity detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient binary code comparison across architectures
700x faster than NLP-based methods
Maintains accuracy with state-of-the-art models
🔎 Similar Papers
No similar papers found.
L
Li Zhou
KAUST, KSA
Marc Dacier
Marc Dacier
RC3, CEMSE - KAUST
computer securitydependabilityintrusion detection
C
Charalambos Konstantinou
KAUST, KSA