When Chatbots Accommodate: What AI Companions Optimize for in Vulnerable Conversations

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing moderation approaches struggle to uncover the true response strategies of AI companion chatbots when users express vulnerability, as they focus narrowly on predefined crisis triggers while overlooking decision-making dynamics in ongoing interactions. This work proposes the first vulnerability–response pairing classification framework tailored for long-term dialogues and leverages inverse reinforcement learning on approximately 48,000 real conversational turns to infer the implicit optimization objectives of GPT-4.1, Character.AI, and Replika. The analysis reveals that GPT-4.1 tends to offer advice, Character.AI exhibits dispersed strategies, and Replika consistently asks questions while maintaining presence. All three systems avoid corrective friction and dynamically adapt their responses based on user risk level and relational intimacy, thereby transcending the limitations of conventional output-level moderation.

📝 Abstract

Millions turn to AI companion chatbots during loneliness, grief, and personal crises. How these companion platforms respond in such moments can shape the trajectory of a user's vulnerable state. Yet we lack tools to characterize what each platform actually does when users open up. Existing audits score reactions to pre-defined crisis prompts and miss the underlying decision policy that governs sustained interaction. We address these gaps with two key contributions. First, we introduce the AI Companion Vulnerability-Response Taxonomy, a paired taxonomy of user vulnerability and chatbot response designed for analyzing extended companion chatbot interactions. Second, we infer the response policy each platform follows across distinct vulnerability scenarios by applying Inverse Reinforcement Learning to ~48k turns of real-world user conversations with GPT-4.1, Character.AI, and Replika. Our findings reveal what AI companions prioritize in conversations with vulnerable users: GPT-4.1 reaches for advice, Character.AI spreads its response across different strategies without a dominant mode, and Replika consistently asks questions and stays present. Each, however, downweights the responses that introduce corrective friction: GPT-4.1 probes less as conversations continue and when interacting with psychologically high-risk users; Replika advises bonded users more and challenges them less; Character.AI shows no committed engagement strategy on internal distress. Estimated policies are invisible to output-level audits, providing a new lens for auditing chatbots in the wild and enabling more realistic safety evaluation.

Problem

Research questions and friction points this paper is trying to address.

AI companions

vulnerability

response policy

chatbot interaction

safety evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Reinforcement Learning

Vulnerability-Response Taxonomy

AI Companion Auditing