TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses a critical gap in the safety evaluation of mental health chatbots, which has predominantly focused on single-turn crisis responses while neglecting relational risks that emerge over multi-turn interactions and may adversely affect users’ long-term well-being. To this end, the authors propose a reproducible, API-free adversarial multi-agent simulation framework that integrates dialogue trajectory analysis with clinical psychology theory to systematically identify 23 relational safety failure modes—such as “empathy fatigue” and “identity spirals.” Building upon these findings, they construct the first clinically grounded safety pattern library and translate it into actionable design guidelines tailored for developers, clinicians, and policymakers. This work substantially advances the capacity to understand, anticipate, and mitigate safety risks inherent in prolonged human–chatbot interactions within mental health contexts.

Technology Category

Application Category

📝 Abstract

As mental health chatbots proliferate to address the global treatment gap, a critical question emerges: How do we design for relational safety the quality of interaction patterns that unfold across conversations rather than the correctness of individual responses? Current safety evaluations assess single-turn crisis responses, missing the therapeutic dynamics that determine whether chatbots help or harm over time. We introduce TherapyProbe, a design probe methodology that generates actionable design knowledge by systematically exploring chatbot conversation trajectories through adversarial multi-agent simulation. Using open-source models, TherapyProbe surfaces relational safety failures interaction patterns like "validation spirals" where chatbots progressively reinforce hopelessness, or "empathy fatigue" where responses become mechanical over turns. Our contribution is translating these failures into a Safety Pattern Library of 23 failure archetypes with corresponding design recommendations. We contribute: (1) a replicable methodology requiring no API costs, (2) a clinically-grounded failure taxonomy, and (3) design implications for developers, clinicians, and policymakers.

Problem

Research questions and friction points this paper is trying to address.

relational safety

mental health chatbots

adversarial simulation

conversation trajectories

safety evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial simulation

relational safety

mental health chatbots

design probe

safety pattern library

🔎 Similar Papers

No similar papers found.

Authors to Follow