Expanding Spatial and Temporal Context for Robotic Imitation Learning With Scene Graphs

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the performance degradation of imitation learning in large-scale, partially observable environments, where agents often lack sufficient long-term spatiotemporal context. To mitigate this limitation, the authors propose a structured memory mechanism based on dynamic scene graphs. This approach introduces dynamic scene graphs into imitation learning for the first time, explicitly modeling spatiotemporal context by continuously recording object-centric relational structures and their temporal evolution. By doing so, it enhances the agent’s capacity to reason over historical and spatial information. Experimental results demonstrate that the proposed architecture significantly improves policy performance in both simulated mobile manipulation tasks and real-world tabletop scenarios, with particularly strong gains in settings demanding long-horizon reasoning and robust generalization.

📝 Abstract

Imitation learning enables robots to learn how to execute tasks via observation. However, real-world environments like homes and offices are often severely partially observed due to their large spatial scales. In addition, many tasks involve executing a series of subtasks requiring autonomous robots to reason over extended time horizons. To address these challenges, we propose using scene graphs as an explicit and structured memory mechanism in imitation learning. By maintaining a dynamic scene graph that captures object-centric relationships and their evolution over time, our method allows the agent to retain relevant historical context during task execution to efficiently reason over incrementally accrued scene information. Our experiments on simulated mobile manipulation and real-world tabletop manipulation demonstrate that our approach substantially improves policy performance, particularly in settings that demand long-term reasoning and robust generalization under partial observability.

Problem

Research questions and friction points this paper is trying to address.

imitation learning

partial observability

long-term reasoning

spatial context

temporal context

Innovation

Methods, ideas, or system contributions that make the work stand out.

scene graphs

imitation learning

partial observability