Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

πŸ“… 2025-03-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address limited expressive capacity and insufficient emotional conveyance in Augmentative and Alternative Communication (AAC) systems, this paper introduces Speak Easeβ€”the first AAC framework that deeply integrates users’ native vocal characteristics, real-time multimodal context (including speech, text, conversational partner behavior, and affective prosody), and large language models (LLMs). Methodologically, it synergistically combines automatic speech recognition (ASR), context-aware LLM-driven semantic generation, personalized text-to-speech (TTS), and multimodal fusion modeling to enable intent-oriented, natural-sounding speech output. Feasibility studies and focus groups with speech-language pathologists demonstrate significant improvements in communicative naturalness (+37%), personalization (+42%), and affective appropriateness (+51%). Speak Ease transcends the information-only paradigm of conventional AAC, strengthening user agency and expressive richness to meet core requirements of real-world communication scenarios.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.
Problem

Research questions and friction points this paper is trying to address.

Enhancing expressivity in AAC through multimodal input and LLMs
Personalizing communication with ASR, LLMs, and text-to-speech technologies
Improving AAC systems by integrating context-aware and emotional cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multimodal input with LLMs
Combines ASR, LLM, and TTS technologies
Enhances expressivity via contextual relevance
πŸ”Ž Similar Papers
No similar papers found.
Y
Yiwen Xu
Khoury College of Computer Sciences, Northeastern University, Vancouver, Canada
M
Monideep Chakraborti
Khoury College of Computer Sciences, Northeastern University, Vancouver, Canada
T
Tianyi Zhang
Northeastern University, Vancouver, Canada
K
Katelyn Eng
Mercury Speech & Language, Vancouver, Canada
Aanchan Mohan
Aanchan Mohan
Northeastern University
Speech RecognitionMachine LearningAcoustic ModellingSpeaker VerificationMulti-lingual Speech Recognition
Mirjana Prpa
Mirjana Prpa
Assistant Professor, Northeastern University, Khoury College of Computer Sciences
mixed reality / VR / AR micro-phenomenologyuser experienceinteractive systemsBCI