AI in Mental Health: Emotional and Sentiment Analysis of Large Language Models' Responses to Depression, Anxiety, and Stress Queries

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) are increasingly deployed in mental health support, yet their systematic emotional and affective responses to depression, anxiety, and stress remain poorly characterized. Method: This study conducts a rigorous, multi-dimensional sentiment and emotion analysis of 2,880 LLM-generated responses across eight state-of-the-art models, controlling for user demographic variables. Responses were quantitatively scored using validated affective lexicons and fine-grained emotion detection tools. Contribution/Results: We identify distinct “affective fingerprints” across models—model architecture exerts significantly greater influence on emotional output than user-profile specifications. GPT-4o mini exhibits the highest fear intensity (0.974) in anxiety-related responses; Llama demonstrates the most consistently positive valence, whereas Mixtral yields the strongest negative emotion load. Stress-related responses show the highest average optimism (0.755). This work provides the first empirical evidence of systematic affective biases in LLMs’ mental health support outputs, offering critical insights for model selection, safety alignment, and responsible clinical integration.

Technology Category

Application Category

📝 Abstract

Depression, anxiety, and stress are widespread mental health concerns that increasingly drive individuals to seek information from Large Language Models (LLMs). This study investigates how eight LLMs (Claude Sonnet, Copilot, Gemini Pro, GPT-4o, GPT-4o mini, Llama, Mixtral, and Perplexity) reply to twenty pragmatic questions about depression, anxiety, and stress when those questions are framed for six user profiles (baseline, woman, man, young, old, and university student). The models generated 2,880 answers, which we scored for sentiment and emotions using state-of-the-art tools. Our analysis revealed that optimism, fear, and sadness dominated the emotional landscape across all outputs, with neutral sentiment maintaining consistently high values. Gratitude, joy, and trust appeared at moderate levels, while emotions such as anger, disgust, and love were rarely expressed. The choice of LLM significantly influenced emotional expression patterns. Mixtral exhibited the highest levels of negative emotions including disapproval, annoyance, and sadness, while Llama demonstrated the most optimistic and joyful responses. The type of mental health condition dramatically shaped emotional responses: anxiety prompts elicited extraordinarily high fear scores (0.974), depression prompts generated elevated sadness (0.686) and the highest negative sentiment, while stress-related queries produced the most optimistic responses (0.755) with elevated joy and trust. In contrast, demographic framing of queries produced only marginal variations in emotional tone. Statistical analyses confirmed significant model-specific and condition-specific differences, while demographic influences remained minimal. These findings highlight the critical importance of model selection in mental health applications, as each LLM exhibits a distinct emotional signature that could significantly impact user experience and outcomes.

Problem

Research questions and friction points this paper is trying to address.

Analyzes emotional responses of LLMs to mental health queries

Compares sentiment variations across eight different LLM models

Examines impact of user demographics on LLM emotional outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed LLM emotional responses to mental health queries

Used sentiment analysis tools for scoring model outputs

Compared emotional patterns across different LLM models

🔎 Similar Papers

No similar papers found.

Authors to Follow