🤖 AI Summary
This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to corpus poisoning attacks, wherein adversaries inject misleading documents to manipulate large language model outputs. To counter this threat, the authors propose Sparse Document Attention for Guarding (SDAG), which introduces block-sparse attention into RAG defense for the first time. By modifying only the attention mask during inference, SDAG effectively blocks cross-attention among retrieved documents, thereby preventing inter-document contamination. Notably, this approach requires neither model fine-tuning nor architectural changes and is compatible with existing defense strategies. Experimental results demonstrate that SDAG significantly reduces attack success rates across diverse poisoning scenarios, and its combination with state-of-the-art defenses yields statistically significant performance improvements.
📝 Abstract
Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLM's output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.