PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing empathetic spoken dialogue systems, which either lose critical acoustic cues in cascaded architectures or struggle with interpretable control over the integration of emotion and knowledge in end-to-end models. To overcome these challenges, the authors propose a multi-agent collaborative framework that decouples and jointly optimizes speech perception, response generation, and speech synthesis. Central to this approach is a prosody-to-language translation mechanism that explicitly injects emotional prosodic information into the reasoning process of a large language model, while also enabling on-demand invocation of external knowledge tools to generate empathetic responses. Experimental results demonstrate that the proposed method significantly outperforms baseline systems across both subjective and objective metrics—including empathy, prosodic appropriateness, and textual quality—achieving deep, controllable integration of emotional expression, semantic content, and knowledge-enhanced generation.
📝 Abstract
Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration. To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation. Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github.com/Bxzfrm/PRISM.
Problem

Research questions and friction points this paper is trying to address.

empathetic spoken dialogue
prosody
speech perception
emotion integration
knowledge integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

prosody-to-language translation
multi-agent reasoning
empathetic spoken dialogue
decoupled speech processing
external knowledge integration
🔎 Similar Papers
No similar papers found.