JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of systematic evaluation benchmarks for JavaScript deobfuscation hinders rigorous assessment of large language models (LLMs) in web security contexts. Method: This paper introduces the first LLM-specific benchmark tailored to web security, covering prevalent obfuscation techniques—including variable renaming, control-flow flattening, and string encryption. We conduct a systematic evaluation using state-of-the-art models (GPT-4o, Mixtral, Llama, DeepSeek-Coder) under dual evaluation criteria: automated execution correctness and semantic equivalence verification. Contribution/Results: We propose the first structured evaluation framework for JS deobfuscation. Our results demonstrate LLMs’ significant advantage in code simplification over traditional tools; however, they also expose critical weaknesses in syntactic correctness and runtime reliability. Empirically, we validate the feasibility of deploying LLMs for analyzing malicious JavaScript scripts and identify concrete optimization directions—particularly in enhancing structural fidelity and executable robustness.

Technology Category

Application Category

📝 Abstract
Deobfuscating JavaScript (JS) code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process, transforming detection and mitigation strategies against these obfuscated threats, a systematic benchmark to quantify their effectiveness and limitations has been notably absent. To address this gap, we present JsDeObsBench, a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We detail our benchmarking methodology, which includes a wide range of obfuscation techniques ranging from basic variable renaming to sophisticated structure transformations, providing a robust framework for assessing LLM performance in real-world scenarios. Our extensive experimental analysis investigates the proficiency of cutting-edge LLMs, e.g., GPT-4o, Mixtral, Llama, and DeepSeek-Coder, revealing superior performance in code simplification despite challenges in maintaining syntax accuracy and execution reliability compared to baseline methods. We further evaluate the deobfuscation of JS malware to exhibit the potential of LLMs in security scenarios. The findings highlight the utility of LLMs in deobfuscation applications and pinpoint crucial areas for further improvement.
Problem

Research questions and friction points this paper is trying to address.

Measuring LLM effectiveness in JavaScript deobfuscation
Benchmarking LLMs against diverse obfuscation techniques
Evaluating LLMs' performance in JS malware deobfuscation
Innovation

Methods, ideas, or system contributions that make the work stand out.

JsDeObsBench benchmarks LLMs for JS deobfuscation
Methodology includes diverse obfuscation techniques
LLMs excel in simplification but need syntax accuracy
🔎 Similar Papers
No similar papers found.
Guoqiang Chen
Guoqiang Chen
QI-ANXIN Technology Research Institute
Binary AnalysisLLMAgentFuzzing
X
Xin Jin
The Ohio State University
Z
Zhiqiang Lin
The Ohio State University