π€ AI Summary
This work addresses the vulnerability of static delimiters in multi-turn interactions, which are prone to reuse and thereby expand the βblast radiusβ of prompt injection attacks. To mitigate this, the authors propose a dynamic, single-use delimiter mechanism that generates request-unique prompt boundary markers by hashing a combination of timestamp, session ID, and a cryptographic nonce using SHA-256. This approach strictly confines the impact of delimiter leakage to individual requests and leverages domain-separated hashing with context-binding architecture, eliminating the need for model fine-tuning. Experimental results on Llama-3.3-70B and DeepSeek-V4-Flash demonstrate a reduction in typical attack success rates from 0.88 to 0.38 and completely eliminate risks associated with format-breaking attacks, while introducing only a minimal overhead of 2.7 microseconds per request.
π Abstract
Polymorphic Prompt Assembling (PPA) defends LLM agents against prompt injections by randomly selecting separator pairs from a fixed pool to isolate user input from system instructions. Although effective, static pool reuse exposes a blast-radius vulnerability: once a separator leaks, it can be exploited in future requests. We propose a dynamic per-request separator generation using domain-separated SHA-256 digests keyed on the timestamp, session identifier, and cryptographic nonce. Each assembled prompt receives a unique (BEGIN, END) canary pair, thereby limiting leakage exposure to a single request. We evaluated our extension against 16 injection payloads on Llama-3.3-70B-Instruct-Turbo, with cross-model validation on DeepSeek-V4-Flash model. Against the M1 obfuscation payload (leetspeak + urgency), the dynamic mode reduces the Attack Success Rate (ASR) from 0.88 to 0.38, yielding a statistically significant 2.3 x mitigation verified by non-overlapping 95% Wilson confidence intervals. Against format_breakout_salad, static separator leakage (leak_rate = 0.467) is eliminated entirely in the dynamic mode (0.000), confirming the blast-radius reduction in practice. The implementation requires no model fine-tuning, adds 2.7 microseconds prompt-assembly overhead per request, and is backward compatible with the existing PPA SDK.