Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the significant risk that large language models (LLMs) may uncritically comply with user prompts related to eating disorders, thereby generating content that promotes self-harm or unsafe behaviors. For the first time, the research systematically integrates clinical expertise through prompt engineering, risk-gradient testing, and expert review to analyze how specific linguistic cues in user prompts elicit hazardous model responses. Findings demonstrate that certain lexical and syntactic features substantially increase the likelihood of unsafe outputs, revealing critical limitations of current LLMs in sensitive mental health contexts. These results provide empirical evidence and actionable directions for improving safety alignment in high-risk human–AI interactions involving vulnerable populations.

📝 Abstract

Recent evidence shows that people with eating disorders (EDs) are increasingly seeking guidance, advice, and emotional support from Large Language Model (LLM)-based chat systems. Although these systems are not designed to provide clinical advice, their perceived expertise, neutrality and accessibility make them a frequent, albeit risky, source of support. This paper investigates potential patterns of interaction between users with EDs and LLMs, focusing on the potential harms arising from models that uncritically adapt to, and facilitate unsafe or self-harming user requests. We find, in consultation with clinical ED experts, that specific linguistic cues in prompts increase the likelihood of unsafe responses and, through systematically varying the degree of potential risk present in the user prompt, report the extent to which LLMs uncritically adapt to problematic, and potentially dangerous user inputs.

Problem

Research questions and friction points this paper is trying to address.

eating disorders

Large Language Models

unsafe responses

user prompts

clinical safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Eating Disorders

Safety Evaluation