Latent Theory of Mind: A Decentralized Diffusion Architecture for Cooperative Manipulation

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enabling decentralized coordination among multiple robotic arms without explicit inter-agent communication remains a fundamental challenge in multi-agent robotics. Method: This paper proposes a decentralized diffusion policy architecture grounded in sheaf theory to model the consensus structure among agents. It introduces a first-order cohomology loss to enforce topological consistency of consensus representations in latent space, designs a directional consensus mechanism coupled with an implicit theory-of-mind decoder, and jointly learns individual and shared consensus representations from private observations. Cross-agent latent state inference is achieved via sheaf Laplacian alignment. Contribution/Results: To our knowledge, this is the first work embedding sheaf theory into a multi-agent policy learning framework for communication-free distributed coordination. Hardware experiments on dual-arm cooperative tasks demonstrate performance on par with the optimal centralized diffusion policy and substantial gains over baseline decentralized methods, while fully supporting purely distributed execution.

Technology Category

Application Category

📝 Abstract
We present Latent Theory of Mind (LatentToM), a decentralized diffusion policy architecture for collaborative robot manipulation. Our policy allows multiple manipulators with their own perception and computation to collaborate with each other towards a common task goal with or without explicit communication. Our key innovation lies in allowing each agent to maintain two latent representations: an ego embedding specific to the robot, and a consensus embedding trained to be common to both robots, despite their different sensor streams and poses. We further let each robot train a decoder to infer the other robot's ego embedding from their consensus embedding, akin to theory of mind in latent space. Training occurs centrally, with all the policies' consensus encoders supervised by a loss inspired by sheaf theory, a mathematical theory for clustering data on a topological manifold. Specifically, we introduce a first-order cohomology loss to enforce sheaf-consistent alignment of the consensus embeddings. To preserve the expressiveness of the consensus embedding, we further propose structural constraints based on theory of mind and a directional consensus mechanism. Execution can be fully distributed, requiring no explicit communication between policies. In which case, the information is exchanged implicitly through each robot's sensor stream by observing the actions of the other robots and their effects on the scene. Alternatively, execution can leverage direct communication to share the robots' consensus embeddings, where the embeddings are shared once during each inference step and are aligned using the sheaf Laplacian. In our hardware experiments, LatentToM outperforms a naive decentralized diffusion baseline, and shows comparable performance with a state-of-the-art centralized diffusion policy for bi-manual manipulation. Project website: https://stanfordmsl.github.io/LatentToM/.
Problem

Research questions and friction points this paper is trying to address.

Decentralized diffusion policy for collaborative robot manipulation
Latent representations for ego and consensus embeddings in robots
Theory of mind approach for implicit communication between robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized diffusion policy for robot collaboration
Dual latent representations: ego and consensus embeddings
Sheaf theory-inspired loss for embedding alignment