Through the Stealth Lens: Rethinking Attacks and Defenses in RAG

๐Ÿ“… 2025-06-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Retrieval-augmented generation (RAG) systems are vulnerable to poisoning attacks targeting retrieved passages, yet existing adversarial methods lack stealth and are readily detectable. Method: This work formally defines โ€œstealthโ€ as a security objective in RAG, revealing an inherent trade-off between manipulating model outputs and preserving output indistinguishability. We propose an attention-pattern-based stealth security model and a probabilistic framework that models normalized passage attention scores as a distribution. To enforce this model, we design the Attention-Variance Filterโ€”a dynamic filtering algorithm that identifies and suppresses poisoned passages. Contribution/Results: Experiments show our defense achieves a 20% improvement in success rate against standard attacks; under adaptive stealthy attacks, the highest attack success rate drops to 35%. Crucially, we uncover a fundamental limitation of attention signals as a sole basis for defense, offering a novel perspective on RAG robustness and highlighting the need for complementary defense mechanisms.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved set, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable detection and mitigation. We formalize stealth using a distinguishability-based security game. If a few poisoned passages are designed to control the response, they must differentiate themselves from benign ones, inherently compromising stealth. This motivates the need for attackers to rigorously analyze intermediate signals involved in generation$unicode{x2014}$such as attention patterns or next-token probability distributions$unicode{x2014}$to avoid easily detectable traces of manipulation. Leveraging attention patterns, we propose a passage-level score$unicode{x2014}$the Normalized Passage Attention Score$unicode{x2014}$used by our Attention-Variance Filter algorithm to identify and filter potentially poisoned passages. This method mitigates existing attacks, improving accuracy by up to $sim 20 %$ over baseline defenses. To probe the limits of attention-based defenses, we craft stealthier adaptive attacks that obscure such traces, achieving up to $35 %$ attack success rate, and highlight the challenges in improving stealth.
Problem

Research questions and friction points this paper is trying to address.

Detecting stealthy attacks on RAG systems via poisoned passages
Mitigating attacks using attention patterns and filtering algorithms
Evaluating adaptive attacks that bypass attention-based defenses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalize stealth via distinguishability-based security game
Propose Attention-Variance Filter algorithm for defense
Craft adaptive attacks to test defense limits
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sarthak Choudhary
University of Wisconsin-Madison
Nils Palumbo
Nils Palumbo
PhD Student in Computer Science, UW-Madison
A
Ashish Hooda
University of Wisconsin-Madison
K
Krishnamurthy Dj Dvijotham
ServiceNow Research
Somesh Jha
Somesh Jha
Lubar Chair of Computer Science, University of Wisconsin
Trustworthy Machine LearningSecurityFormal methodsProgramming Languages