🤖 AI Summary
This study addresses the challenge faced by individuals with weak visual imagery abilities in achieving self-guided goal pursuit via mental imagery. We propose the Emotional Self-Voice (ESV) intervention paradigm—affective, identity-congruent auditory feedback generated via zero-shot voice cloning and emotion-controllable large language models to preserve speaker identity while embedding target emotional states. A human-computer interaction experiment (N=60) empirically validated its efficacy. Our key contribution is the first use of AI-synthesized “self-voice” as an identity-based behavioral nudge, circumventing reliance on visual mental imagery. Results demonstrate that the ESV group significantly outperformed both text-only and mental-imagery control groups across resilience, self-confidence, motivation, and goal commitment. Participants also rated ESV as more engaging and personalized. This work establishes a scalable, highly immersive pathway for self-regulation interventions grounded in embodied, voice-mediated identity cues.
📝 Abstract
Emotions, shaped by past experiences, significantly influence decision-making and goal pursuit. Traditional cognitive-behavioral techniques for personal development rely on mental imagery to envision ideal selves, but may be less effective for individuals who struggle with visualization. This paper introduces Emotional Self-Voice (ESV), a novel system combining emotionally expressive language models and voice cloning technologies to render customized responses in the user's own voice. We investigate the potential of ESV to nudge individuals towards their ideal selves in a study with 60 participants. Across all three conditions (ESV, text-only, and mental imagination), we observed an increase in resilience, confidence, motivation, and goal commitment, and the ESV condition was perceived as uniquely engaging and personalized. We discuss the implications of designing generated self-voice systems as a personalized behavioral intervention for different scenarios.