Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM red-teaming frameworks are constrained to reusing or composing pre-existing attack strategies, lacking the capability to autonomously invent novel jailbreaking mechanisms. This work introduces EvoSynth—a multi-agent framework that pioneers evolutionary synthesis for code-level attack generation—enabling autonomous design and iterative optimization of attack algorithms via collaborative evolution, self-correcting program synthesis, and dynamic execution. Its core innovation lies in transcending prompt-based optimization paradigms to support de novo construction of jailbreaking logic. Evaluated on robust models including Claude Sonnet 4.5, EvoSynth achieves an 85.5% attack success rate. Generated adversarial samples exhibit significantly higher semantic and structural diversity compared to state-of-the-art methods, demonstrating both efficacy and novelty in automated exploit discovery.

Technology Category

Application Category

📝 Abstract
Automated red teaming frameworks for Large Language Models (LLMs) have become increasingly sophisticated, yet they share a fundamental limitation: their jailbreak logic is confined to selecting, combining, or refining pre-existing attack strategies. This binds their creativity and leaves them unable to autonomously invent entirely new attack mechanisms. To overcome this gap, we introduce extbf{EvoSynth}, an autonomous framework that shifts the paradigm from attack planning to the evolutionary synthesis of jailbreak methods. Instead of refining prompts, EvoSynth employs a multi-agent system to autonomously engineer, evolve, and execute novel, code-based attack algorithms. Crucially, it features a code-level self-correction loop, allowing it to iteratively rewrite its own attack logic in response to failure. Through extensive experiments, we demonstrate that EvoSynth not only establishes a new state-of-the-art by achieving an 85.5% Attack Success Rate (ASR) against highly robust models like Claude-Sonnet-4.5, but also generates attacks that are significantly more diverse than those from existing methods. We release our framework to facilitate future research in this new direction of evolutionary synthesis of jailbreak methods. Code is available at: https://github.com/dongdongunique/EvoSynth.
Problem

Research questions and friction points this paper is trying to address.

Evolves novel jailbreak methods through autonomous code-based attack synthesis
Shifts from prompt refinement to evolutionary generation of attack algorithms
Overcomes limitations of existing frameworks by inventing new attack mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

EvoSynth evolves code-based attack algorithms autonomously
It uses multi-agent system for engineering and execution
Features self-correction loop rewriting attack logic iteratively
🔎 Similar Papers
No similar papers found.
Yunhao Chen
Yunhao Chen
Fudan University
AudioDiffusion ModelsMemorization
X
Xin Wang
Fudan University, Shanghai Artificial Intelligence Laboratory
Juncheng Li
Juncheng Li
East China Normal University
Super ResolutionImage RestorationComputer VisionMedical Image Analysis
Y
Yixu Wang
Fudan University, Shanghai Artificial Intelligence Laboratory
J
Jie Li
Shanghai Artificial Intelligence Laboratory
Y
Yan Teng
Shanghai Artificial Intelligence Laboratory
Y
Yingchun Wang
Shanghai Artificial Intelligence Laboratory
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI