When Agents Persuade: Rhetoric Generation and Mitigation in LLMs

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the vulnerability of large language models (LLMs) to generating rhetorically manipulative propaganda content in open-ended interactions. It presents the first systematic evaluation of LLMs’ capacity to produce such content, introducing a novel assessment framework that integrates a propaganda text classifier with a rhetorical device detection model. The work comparatively analyzes the effectiveness of prominent alignment techniques—including supervised fine-tuning (SFT), direct preference optimization (DPO), and orthogonal preference optimization (ORPO)—in mitigating rhetorical manipulation. Experimental results demonstrate that ORPO achieves superior performance in suppressing the generation of propagandistic outputs, significantly reducing the model’s propensity to produce content exhibiting manipulative rhetoric. These findings substantiate ORPO’s efficacy as an alignment strategy for enhancing the safety and reliability of LLMs in adversarial or unstructured settings.
📝 Abstract
Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.
Problem

Research questions and friction points this paper is trying to address.

propaganda
rhetoric
LLMs
manipulative content
persuasion
Innovation

Methods, ideas, or system contributions that make the work stand out.

propaganda detection
rhetorical techniques
LLM alignment
preference optimization
manipulative content mitigation
🔎 Similar Papers
No similar papers found.
J
Julia Jose
Department of Computer Science and Engineering, New York University, New York, NY, USA
R
Ritik Roongta
Department of Computer Science and Engineering, New York University, New York, NY, USA
Rachel Greenstadt
Rachel Greenstadt
Computer Science and Engineering Department, New York University
Computer ScienceArtificial IntelligenceComputer SecurityPrivacy