Through the Stealth Lens: Rethinking Attacks and Defenses in RAG

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) systems are vulnerable to poisoning attacks targeting retrieved passages, yet existing adversarial methods lack stealth and are readily detectable. Method: This work formally defines “stealth” as a security objective in RAG, revealing an inherent trade-off between manipulating model outputs and preserving output indistinguishability. We propose an attention-pattern-based stealth security model and a probabilistic framework that models normalized passage attention scores as a distribution. To enforce this model, we design the Attention-Variance Filter—a dynamic filtering algorithm that identifies and suppresses poisoned passages. Contribution/Results: Experiments show our defense achieves a 20% improvement in success rate against standard attacks; under adaptive stealthy attacks, the highest attack success rate drops to 35%. Crucially, we uncover a fundamental limitation of attention signals as a sole basis for defense, offering a novel perspective on RAG robustness and highlighting the need for complementary defense mechanisms.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved set, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable detection and mitigation. We formalize stealth using a distinguishability-based security game. If a few poisoned passages are designed to control the response, they must differentiate themselves from benign ones, inherently compromising stealth. This motivates the need for attackers to rigorously analyze intermediate signals involved in generation$unicode{x2014}$such as attention patterns or next-token probability distributions$unicode{x2014}$to avoid easily detectable traces of manipulation. Leveraging attention patterns, we propose a passage-level score$unicode{x2014}$the Normalized Passage Attention Score$unicode{x2014}$used by our Attention-Variance Filter algorithm to identify and filter potentially poisoned passages. This method mitigates existing attacks, improving accuracy by up to $sim 20 %$ over baseline defenses. To probe the limits of attention-based defenses, we craft stealthier adaptive attacks that obscure such traces, achieving up to $35 %$ attack success rate, and highlight the challenges in improving stealth.

Problem

Research questions and friction points this paper is trying to address.

Detecting stealthy attacks on RAG systems via poisoned passages

Mitigating attacks using attention patterns and filtering algorithms

Evaluating adaptive attacks that bypass attention-based defenses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalize stealth via distinguishability-based security game

Propose Attention-Variance Filter algorithm for defense

Craft adaptive attacks to test defense limits

🔎 Similar Papers

No similar papers found.