🤖 AI Summary
This study addresses the limited contextual understanding and affective awareness of large language models (e.g., GPT-2) in mental health support dialogues. We propose a novel method integrating structured input reconstruction with multi-component reinforcement learning (MRL), explicitly modeling user utterances, dialogue history, and fine-grained emotional states. A multi-objective reward function jointly optimizes contextual coherence, affective consistency, and clinical appropriateness, enabling synergistic supervised fine-tuning and RL-based training. Experiments demonstrate a significant improvement in emotion recognition accuracy—from 66.96% to 99.34%—and consistent gains across generation metrics (BLEU, ROUGE, METEOR) over all baselines. Human evaluation by LLM-based annotators confirms high response relevance and clinical plausibility. To our knowledge, this is the first work to systematically incorporate MRL into psychotherapeutic dialogue generation, substantially enhancing the model’s joint situational–affective modeling capability.
📝 Abstract
Mental health illness represents a substantial global socioeconomic burden, with COVID-19 further exacerbating accessibility challenges and driving increased demand for telehealth mental health support. While large language models (LLMs) offer promising solutions through 24/7 availability and non-judgmental interactions, pre-trained models often lack the contextual and emotional awareness necessary for appropriate therapeutic responses. This paper investigated the application of supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance GPT-2's capacity for therapeutic dialogue generation. The methodology restructured input formats to enable simultaneous processing of contextual information and emotional states alongside user input, employing a multi-component reward function that aligned model outputs with professional therapist responses and annotated emotions. Results demonstrated improvements through reinforcement learning over baseline GPT-2 across multiple evaluation metrics: BLEU (0.0111), ROUGE-1 (0.1397), ROUGE-2 (0.0213), ROUGE-L (0.1317), and METEOR (0.0581). LLM evaluation confirmed high contextual relevance and professionalism, while reinforcement learning achieved 99.34% emotion accuracy compared to 66.96% for baseline GPT-2. These findings demonstrate reinforcement learning's effectiveness in developing therapeutic dialogue systems that can serve as valuable assistive tools for therapists while maintaining essential human clinical oversight.