mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies

πŸ“… 2025-09-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address task interruption in end-to-end robotic control caused by transient occlusion or out-of-view object displacement, this paper proposes MindMapβ€”a novel framework that pioneers the deep integration of semantic-level 3D environment reconstruction with diffusion-based policy networks to realize long-term spatial memory in 3D action policies. Methodologically, MindMap models spatial memory via depth-aware feature maps, unifying real-time semantic 3D reconstruction and diffusion-driven 3D action generation within a single end-to-end trainable architecture that jointly optimizes perception, memory, and decision-making. Experiments across diverse simulated manipulation tasks requiring spatial memory demonstrate that MindMap significantly outperforms memory-less baselines and state-of-the-art methods. To foster reproducibility and community advancement, we publicly release the 3D reconstruction system, training code, and evaluation benchmark.

Technology Category

Application Category

πŸ“ Abstract
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
Problem

Research questions and friction points this paper is trying to address.

Enabling robots to remember spatial scene composition during manipulation tasks
Solving tasks requiring objects to pass in and out of view
Building memory mechanisms into robot learning systems effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic 3D reconstruction for environment mapping
Implements 3D diffusion policy for trajectory generation
Integrates spatial memory through deep feature maps
R
Remo Steiner
NVIDIA, Zurich, Switzerland. Santa Clara, California.
Alexander Millane
Alexander Millane
nvidia
robotics
D
David Tingdahl
NVIDIA, Zurich, Switzerland. Santa Clara, California.
C
Clemens Volk
NVIDIA, Zurich, Switzerland. Santa Clara, California.
V
Vikram Ramasamy
NVIDIA, Zurich, Switzerland. Santa Clara, California.
X
Xinjie Yao
NVIDIA, Zurich, Switzerland. Santa Clara, California.
P
Peter Du
NVIDIA, Zurich, Switzerland. Santa Clara, California.
S
Soha Pouya
NVIDIA, Zurich, Switzerland. Santa Clara, California.
S
Shiwei Sheng
NVIDIA, Zurich, Switzerland. Santa Clara, California.