🤖 AI Summary
This work addresses the challenge of dynamically adapting to teammates’ behaviors and goals in multi-agent collaboration under zero prior knowledge. We propose an online joint decision-making framework endowed with theory-of-mind (ToM) capabilities grounded in formal cognitive modeling. Methodologically, we introduce the first integration of meta-learned ToM inference with multi-agent denoising diffusion policies, yielding an end-to-end model that jointly infers teammates’ latent goals and generates cooperative trajectories in real time. Furthermore, we design an online dynamic replanning mechanism based on state-deviation detection, enabling millisecond-level trajectory updates. Evaluated on a simulated cooking task with no prior teammate specifications, our approach reduces resource consumption by 23.6% while maintaining a 98.4% team task success rate. These results empirically validate the critical role of jointly modeling real-time observations and ToM for adaptive multi-agent coordination.
📝 Abstract
In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.