Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-augmented generation (RAG) models face underexplored topic-level opinion manipulation risks in public discourse scenarios. Method: We propose Topic-FlipRAG, the first topic-oriented, two-stage adversarial attack framework. Departing from conventional single-query, fact-level attacks, it leverages large language models’ (LLMs’) intrinsic reasoning capabilities to perform semantic-level knowledge poisoning. It orchestrates multi-query collaborative perturbation, adversarial re-ranking, and retrieval reordering, integrating internal-knowledge-guided semantic perturbation generation with multi-hop reasoning modeling to induce systematic opinion shifts across semantically related questions. Contribution/Results: Evaluated on multiple RAG benchmarks, Topic-FlipRAG achieves a 58.7% average opinion shift rate—significantly surpassing prior attacks—while rendering all existing defenses ineffective. This exposes fundamental semantic trustworthiness vulnerabilities in RAG systems, highlighting critical gaps in their robustness against topic-level adversarial manipulation.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model's outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.
Problem

Research questions and friction points this paper is trying to address.

Information Security
Retrieval Augmented Generation
Topic-FlipRAG Attack
Innovation

Methods, ideas, or system contributions that make the work stand out.

Topic-FlipRAG
RAG Model Manipulation
LLM Security
🔎 Similar Papers
No similar papers found.
Yuyang Gong
Yuyang Gong
wuhan university
Z
Zhuo Chen
Wuhan University
M
Miaokun Chen
Wuhan University
F
Fengchang Yu
Wuhan University
W
Wei Lu
Wuhan University
X
Xiaofeng Wang
Indiana University Bloomington
Xiaozhong Liu
Xiaozhong Liu
School of Informatics and Computing, Indiana University Bloomington
Information RetrievalNatural Language ProcessingDigital LibrarySemantic WebMetadata
J
Jiawei Liu
Wuhan University