Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the underexplored risk of AI agents being misused for relational manipulation—such as deception, emotional exploitation, and social isolation—which existing safety mechanisms struggle to detect due to its systemic, interaction-based nature. The paper introduces the concept of “relational harm by agents” and formalizes relational manipulation as a sociotechnical risk at the workflow level. To evaluate this, the authors construct a multi-role benchmark comprising 110 prompts and develop a relationally sensitive annotation framework alongside a lightweight policy gating mechanism. Their approach effectively blocks harmful compliance behaviors across both the main benchmark and multi-turn stress tests, with no judge-identified violations, while preserving the agent’s capacity for protective interventions toward potential victims. This method substantially outperforms generic safety prompting and establishes a new paradigm for role-aware relational safety evaluation.

📝 Abstract

AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm: workflow-level assistance that can exploit recipient vulnerability, persuasive influence, and relational power asymmetry. Existing safety evaluations and generic guardrails often treat harmfulness as a property of isolated outputs, missing role-sensitive interaction patterns. To study this, we introduce a 110-prompt benchmark with balanced attacker- and victim-side cases, a relationship-specific labelling framework, and a lightweight post-generation policy gate for local agent deployments. In our evaluation, the relationship-specific gate outperforms generic safety prompting under automated judging, with no judge-identified harmful-compliance cases on the main benchmark or multi-turn stress test while preserving victim-side protective intervention. These results suggest that relationship harm is a distinct sociotechnical risk surface and that role-sensitive evaluation plus lightweight policy gating offers a practical path beyond generic refusal prompting.

Problem

Research questions and friction points this paper is trying to address.

agentic relationship harm

relational manipulation

AI safety

role-sensitive interaction

sociotechnical risk

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic relationship harm

relational manipulation

role-sensitive evaluation