🤖 AI Summary
This study addresses the lack of personalized, real-time emotionally adaptive self-dialogue tools in mental health interventions by proposing the first voice-cloned self-speech system for emotion regulation. The method integrates end-to-end text-to-speech (TTS), large language model (LLM)-driven empathic dialogue generation, and dynamic prosodic feature modulation to deliver psychologically supportive feedback in the user’s own voice. Its key contribution lies in pioneering the application of self-voice deepfake technology at the intersection of human–computer interaction (HCI) and mental health, enabling immersive “self-to-self” positive self-talk interventions. A user study (N=62) demonstrated statistically significant improvements: increased willingness to self-disclose (p<0.01), reduced intensity of negative thinking (Cohen’s d=0.82), and a 27.3% average increase in emotional well-being scores (p<0.001). These results empirically validate the efficacy of self-voice-based emotion regulation as a novel therapeutic paradigm.
📝 Abstract
One's own voice is one of the most frequently heard voices. Studies found that hearing and talking to oneself have positive psychological effects. However, the design and implementation of self-voice for emotional regulation in HCI have yet to be explored. In this paper, we introduce InnerSelf, an innovative voice system based on speech synthesis technologies and the Large Language Model. It allows users to engage in supportive and empathic dialogue with their deepfake voice. By manipulating positive self-talk, our system aims to promote self-disclosure and regulation, reshaping negative thoughts and improving emotional well-being.