🤖 AI Summary
Existing LLMs for software engineering rely on runtime-environment-based training (e.g., GitHub issue resolution), but cybersecurity tasks often lack stable execution contexts, hindering direct adaptation of this paradigm. This paper introduces Cyber-Zero—the first runtime-environment-free framework for training cybersecurity agents. Its core innovation lies in leveraging publicly available CTF write-ups and employing persona-driven LLM simulation to reverse-engineer authentic operational behaviors, thereby generating high-quality, long-horizon interactive trajectories. Cyber-Zero thus enables fully environment-agnostic agent training—a first in the field—and democratizes cybersecurity agent development. Evaluated on three standard CTF benchmarks, Cyber-Zero-32B achieves up to a 13.1% performance gain over prior open-source models, matching the state-of-the-art among open models and rivaling proprietary systems including DeepSeek-V3-0324 and Claude-3.5-Sonnet.
📝 Abstract
Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.