🤖 AI Summary
In virtual reality (VR), response latency—often several seconds—introduced by large language models (LLMs) severely degrades naturalness and presence during interaction with embodied conversational agents (ECAs). To mitigate this, we propose and empirically evaluate two classes of latency-compensation strategies: multimodal behavioral fillers (e.g., micro-gestures, filler utterances) and symbolic progress cues (e.g., thought bubbles, badge-style progress bars). A controlled VR user study reveals that behavioral fillers significantly enhance perceived naturalness, humanness, and user presence, reduce subjective response time, and are preferred by the majority of participants; in contrast, symbolic cues yield no measurable improvement in user experience or presence. This work presents the first systematic comparative evaluation of behavioral versus symbolic latency compensation for VR-based ECAs, demonstrating that embodied, temporally coherent behavioral fillers constitute a more effective paradigm for immersive latency mitigation.
📝 Abstract
When communicating with embodied conversational agents (ECAs) in virtual reality, there might be delays in the responses of the agents lasting several seconds, for example, due to more extensive computations of the answers when large language models are used. Such delays might lead to unnatural or frustrating interactions. In this paper, we investigate filler types to mitigate these effects and lead to a more positive experience and perception of the agent. In a within-subject study, we asked 24 participants to communicate with ECAs in virtual reality, comparing four strategies displayed during the delays: a multimodal behavioral filler consisting of conversational and gestural fillers, a base condition with only idle motions, and two symbolic indicators with progress bars, one embedded as a badge on the agent, the other one external and visualized as a thinking bubble. Our results indicate that the behavioral filler improved perceived response time, three subscales of presence, humanlikeness, and naturalness. Participants looked away from the face more often when symbolic indicators were displayed, but the visualizations did not lead to a more positive impression of the agent or to increased presence. The majority of participants preferred the behavioral fillers, only 12.5% and 4.2% favored the symbolic embedded and external conditions, respectively.