🤖 AI Summary
This paper addresses the design challenges of multi-party conversational agents (MPCAs), which require simultaneous modeling of participants’ mental states, semantic understanding, and behavioral prediction. We adopt Theory of Mind (ToM) as the foundational paradigm and systematically survey three core challenges—mental state modeling, semantic comprehension, and action decision-making—tracing the technical evolution from conventional models to large language models (LLMs) and multimodal integration. We introduce, for the first time, a three-dimensional evaluation framework encompassing sociality, linguistic competence, and interactivity, identifying critical bottlenecks in current approaches. Our analysis underscores multimodal understanding as a pivotal unresolved direction. The work establishes a theoretical foundation and a systematic roadmap for developing socially intelligent, group-level dialogue systems.
📝 Abstract
Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.