Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper examines the deep ethical and socio-technical risks posed by RLHF (Reinforcement Learning from Human Feedback)-driven generative AI chatbots—such as ChatGPT and Claude—with particular attention to their implicit shaping of linguistic norms, information practices, and interpersonal relationships. Moving beyond content-oriented analyses, it pioneers the systematic application of procedural rhetoric (Ian Bogost) to AI ethics, focusing on RLHF’s embedded mechanisms of procedural persuasion. Integrating RLHF’s technical architecture, critical discourse analysis, and a socio-technical lens, the study reveals how these mechanisms may reinforce linguistic hegemony, entrench biases, induce decontextualized learning, and erode interpersonal trust. The work establishes a novel theoretical framework for assessing the procedural ethical impacts of AI systems, offering educators, researchers, and policymakers an analytical toolkit and a cautionary framework to identify and intervene in latent persuasive structures. (149 words)

Technology Category

Application Category

📝 Abstract

Since 2022, versions of generative AI chatbots such as ChatGPT and Claude have been trained using a specialized technique called Reinforcement Learning from Human Feedback (RLHF) to fine-tune language model output using feedback from human annotators. As a result, the integration of RLHF has greatly enhanced the outputs of these large language models (LLMs) and made the interactions and responses appear more"human-like"than those of previous versions using only supervised learning. The increasing convergence of human and machine-written text has potentially severe ethical, sociotechnical, and pedagogical implications relating to transparency, trust, bias, and interpersonal relations. To highlight these implications, this paper presents a rhetorical analysis of some of the central procedures and processes currently being reshaped by RLHF-enhanced generative AI chatbots: upholding language conventions, information seeking practices, and expectations for social relationships. Rhetorical investigations of generative AI and LLMs have, to this point, focused largely on the persuasiveness of the content generated. Using Ian Bogost's concept of procedural rhetoric, this paper shifts the site of rhetorical investigation from content analysis to the underlying mechanisms of persuasion built into RLHF-enhanced LLMs. In doing so, this theoretical investigation opens a new direction for further inquiry in AI ethics that considers how procedures rerouted through AI-driven technologies might reinforce hegemonic language use, perpetuate biases, decontextualize learning, and encroach upon human relationships. It will therefore be of interest to educators, researchers, scholars, and the growing number of users of generative AI chatbots.

Problem

Research questions and friction points this paper is trying to address.

Ethical implications of RLHF in AI-human text convergence

Procedural rhetoric in RLHF mechanisms shaping persuasion

Impact of RLHF on language bias and human relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Reinforcement Learning from Human Feedback

Applies procedural rhetoric for ethical analysis

Shifts focus from content to persuasion mechanisms

🔎 Similar Papers

Reinforcement Learning and Machine ethics:a systematic review