RIPRAG: Hack a Black-box Retrieval-Augmented Generation Question-Answering System with Reinforcement Learning

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses content poisoning attacks against black-box retrieval-augmented generation (RAG) question-answering systems. We propose RIPRAG, the first attack framework that operates without access to internal system components—relying solely on final output feedback. RIPRAG employs end-to-end reinforcement learning to optimize the generation of malicious documents that steer large language models (LLMs) toward attacker-preferred responses. Its key contributions are: (1) the first effective poisoning attack against multi-stage RAG systems under a fully black-box setting—where both the retrieval mechanism and RAG architecture are unknown; and (2) an adaptive attack paradigm guided by sparse success signals, eliminating reliance on gradients or intermediate outputs. Experiments across diverse, complex RAG systems demonstrate that RIPRAG achieves up to 0.72 higher attack success rate than state-of-the-art baselines, revealing critical vulnerabilities in existing defenses.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become a core technology for tasks such as question-answering (QA) and content generation. However, by injecting poisoned documents into the database of RAG systems, attackers can manipulate LLMs to generate text that aligns with their intended preferences. Existing research has primarily focused on white-box attacks against simplified RAG architectures. In this paper, we investigate a more complex and realistic scenario: the attacker lacks knowledge of the RAG system's internal composition and implementation details, and the RAG system comprises components beyond a mere retriever. Specifically, we propose the RIPRAG attack framework, an end-to-end attack pipeline that treats the target RAG system as a black box, where the only information accessible to the attacker is whether the poisoning succeeds. Our method leverages Reinforcement Learning (RL) to optimize the generation model for poisoned documents, ensuring that the generated poisoned document aligns with the target RAG system's preferences. Experimental results demonstrate that this method can effectively execute poisoning attacks against most complex RAG systems, achieving an attack success rate (ASR) improvement of up to 0.72 compared to baseline methods. This highlights prevalent deficiencies in current defensive methods and provides critical insights for LLM security research.
Problem

Research questions and friction points this paper is trying to address.

Attacking black-box RAG systems through poisoned document injection
Manipulating LLMs to generate attacker-preferred text outputs
Improving poisoning attack success rates using reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Reinforcement Learning for black-box attacks
Optimizes poisoned documents to match system preferences
Achieves high attack success rates on complex RAG systems
🔎 Similar Papers
No similar papers found.
Meng Xi
Meng Xi
College of Computer Science and Technology, Zhejiang University
service computingservice patterndata miningartificial intelligence
S
Sihan Lv
School of Software Technology, Zhejiang University, Ningbo, China
Y
Yechen Jin
School of Software Technology, Zhejiang University, Ningbo, China
Guanjie Cheng
Guanjie Cheng
Assistant Professor, School of Software Technology, Zhejiang University
AIoTMuti-Agent CollaborationEdge ComputingData Security and BlockchainPrivacy Protection
N
Naibo Wang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Y
Ying Li
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Jianwei Yin
Jianwei Yin
Professor of Computer Science and Technology, Zhejiang University
Service ComputingComputer ArchitectureDistributed ComputingAI