🤖 AI Summary
Current large language models struggle to jointly model the cognitive impairments, emotional dynamics, and nonverbal behaviors of dementia patients in multi-turn dialogues, resulting in insufficient simulation fidelity. This work proposes DemMA, the first framework to unify a clinically informed cognitive-personality model with explicit nonverbal behavior modeling within a single architecture. DemMA integrates pathological traits, personality profiles, and subtype-specific memory states, and employs a chain-of-thought distillation mechanism to simultaneously generate reasoning traces, verbal responses, and multimodal behaviors—such as facial expressions, gestures, and vocal prosody—in a single forward pass. Experimental results demonstrate that DemMA significantly outperforms strong baselines in both realism and computational efficiency, receiving consistent validation from clinical experts, medical students, and large language models.
📝 Abstract
Simulating dementia patients with large language models (LLMs) is challenging due to the need to jointly model cognitive impairment, emotional dynamics, and nonverbal behaviors over long conversations. We present DemMA, an expert-guided dementia dialogue agent for high-fidelity multi-turn patient simulation. DemMA constructs clinically grounded dementia personas by integrating pathology information, personality traits, and subtype-specific memory-status personas informed by clinical experts. To move beyond text-only simulation, DemMA explicitly models nonverbal behaviors, including motion, facial expressions, and vocal cues. We further introduce a Chain-of-Thought distillation framework that trains a single LLM to jointly generate reasoning traces, patient utterances, and aligned behavioral actions within one forward pass, enabling efficient deployment without multi-agent inference. Extensive evaluations with experts, medical students, and LLM judges demonstrate that DemMA significantly outperforms strong baselines across multiple metrics.