Scaling behavior of large language models in emotional safety classification across sizes and tasks

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the safety classification capability of large language models (LLMs) for emotion-sensitive content in mental health contexts, focusing on ternary emotional safety classification (safe/unsafe/borderline) and multi-label identification across six risk categories. Method: We introduce a newly constructed mental health safety dataset and systematically evaluate the scaling behavior of the LLaMA series (1B–70B) under zero-shot, few-shot, and fine-tuning paradigms. To enhance data quality, we incorporate ChatGPT-generated affective rephrasing prompts. Contribution/Results: Lightweight fine-tuning of small models (e.g., 1B) achieves performance on par with BERT and larger LLMs for high-frequency classes, requiring <2 GB GPU memory during inference. In contrast, larger models demonstrate superior performance in multi-label and zero-shot settings. These findings provide empirically validated, practical pathways for deploying resource-efficient, privacy-preserving, on-device emotional safety systems in clinical and low-resource environments.

Technology Category

Application Category

📝 Abstract
Understanding how large language models (LLMs) process emotionally sensitive content is critical for building safe and reliable systems, particularly in mental health contexts. We investigate the scaling behavior of LLMs on two key tasks: trinary classification of emotional safety (safe vs. unsafe vs. borderline) and multi-label classification using a six-category safety risk taxonomy. To support this, we construct a novel dataset by merging several human-authored mental health datasets (> 15K samples) and augmenting them with emotion re-interpretation prompts generated via ChatGPT. We evaluate four LLaMA models (1B, 3B, 8B, 70B) across zero-shot, few-shot, and fine-tuning settings. Our results show that larger LLMs achieve stronger average performance, particularly in nuanced multi-label classification and in zero-shot settings. However, lightweight fine-tuning allowed the 1B model to achieve performance comparable to larger models and BERT in several high-data categories, while requiring <2GB VRAM at inference. These findings suggest that smaller, on-device models can serve as viable, privacy-preserving alternatives for sensitive applications, offering the ability to interpret emotional context and maintain safe conversational boundaries. This work highlights key implications for therapeutic LLM applications and the scalable alignment of safety-critical systems.
Problem

Research questions and friction points this paper is trying to address.

Scaling behavior of LLMs in emotional safety classification tasks
Performance comparison across model sizes and training settings
Viability of smaller models for privacy-sensitive emotional context applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLaMA models scaling across sizes
Dataset merging and ChatGPT augmentation
Lightweight fine-tuning for privacy preservation
🔎 Similar Papers
No similar papers found.
Edoardo Pinzuti
Edoardo Pinzuti
Leibniz Institute for Resilience Research, Mainz, Germany; Brain Imaging Center, Frankfurt am Main
neuroscienceinformation theory
O
Oliver Tüscher
Leibniz Institute for Resilience Research, Mainz, Germany; Department of Psychiatry, Psychotherapy and Psychosomatic Medicine, University Medical Center Halle, Halle (Saale), Germany; German Center for Mental Health (DZPG), Site Halle-Jena-Magdeburg, Halle (Saale), Germany; Department of Psychiatry and Psychotherapy, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany
A
André Ferreira Castro
School of Life Sciences, Technical University of Munich, Freising 85354, Germany