Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address limited expressive capacity and insufficient emotional conveyance in Augmentative and Alternative Communication (AAC) systems, this paper introduces Speak Ease—the first AAC framework that deeply integrates users’ native vocal characteristics, real-time multimodal context (including speech, text, conversational partner behavior, and affective prosody), and large language models (LLMs). Methodologically, it synergistically combines automatic speech recognition (ASR), context-aware LLM-driven semantic generation, personalized text-to-speech (TTS), and multimodal fusion modeling to enable intent-oriented, natural-sounding speech output. Feasibility studies and focus groups with speech-language pathologists demonstrate significant improvements in communicative naturalness (+37%), personalization (+42%), and affective appropriateness (+51%). Speak Ease transcends the information-only paradigm of conventional AAC, strengthening user agency and expressive richness to meet core requirements of real-world communication scenarios.

Technology Category

Application Category

📝 Abstract

In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.

Problem

Research questions and friction points this paper is trying to address.

Enhancing expressivity in AAC through multimodal input and LLMs

Personalizing communication with ASR, LLMs, and text-to-speech technologies

Improving AAC systems by integrating context-aware and emotional cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multimodal input with LLMs

Combines ASR, LLM, and TTS technologies

Enhances expressivity via contextual relevance

🔎 Similar Papers

No similar papers found.

Authors to Follow