Influence Factors on RAG Poisoning

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the sources of vulnerability in Retrieval-Augmented Generation (RAG) systems under poisoning attacks. Through a comprehensive full-factorial experiment encompassing 432 configurations, it evaluates the impact of datasets, retriever types (dense, graph-based, and BM25), retrieval depth, knowledge base composition, chunking strategies, and generation models on system robustness. The findings reveal that RAG’s susceptibility arises from complex interactions among retrieval, generation, and knowledge base configurations rather than from any single component’s deficiency. Dense and graph-based retrievers significantly outperform BM25, while increasing retrieval depth or replicating poisoned content across multiple knowledge sources substantially elevates attack success rates. Conversely, incorporating clean data from diverse sources effectively mitigates such attacks. This work is the first to uncover the key factors governing RAG’s robustness against poisoning and their underlying coupling mechanisms.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems enhance large language models by grounding responses in retrieved documents from external knowledge sources at inference time. However, this reliance on retrieved content introduces vulnerabilities to poisoning attacks, in which adversarial documents can manipulate both the retrieval process and the generated outputs. This paper investigates poisoning robustness in RAG through a full factorial experimental study covering 432 configurations. We analyze the impacts of dataset, retriever type, retrieval depth, database composition, chunking strategy, and generator model on retrieval-level and generation-level metrics. The results show that retriever architecture, dataset, and retrieval depth are the strongest factors affecting poisoning exposure, while generator choice and database composition have a major impact on downstream attack success. Dense and graph-based retrievers generally improve robustness relative to BM25, whereas larger retrieval depth increases the likelihood of retrieving poisoned passages. We further show that replicating poisoned content across multiple databases amplifies adversarial influence, while additional clean sources can mitigate it. These findings highlight that poisoning vulnerability in RAG is not attributable to a single component, but instead arises from the interaction of retrieval, generation, and knowledge-base configuration.
Problem

Research questions and friction points this paper is trying to address.

RAG poisoning
retrieval-augmented generation
adversarial attacks
poisoning robustness
retrieval vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG poisoning
retrieval robustness
adversarial attacks
dense retrievers
factorial experimental design