MTIL: Encoding Full History with Mamba for Temporal Imitation Learning

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Standard imitation learning relies on the Markov assumption, limiting its capacity to model long-range temporal dependencies—especially critical in robotic sequential manipulation tasks where disambiguating observations requires rich historical context. To address this, we introduce Mamba—a state space model (SSM)—into imitation learning for the first time, proposing an end-to-end full-trajectory history modeling framework. Our method recursively compresses and leverages the entire observation history—including multimodal inputs—via efficient hidden state propagation, enabling non-Markovian action prediction. Crucially, it abandons restrictive local-state assumptions and explicitly encodes long-horizon temporal dependencies. We evaluate across ACT, RoboMimic, LIBERO, and real-world sequential manipulation benchmarks, consistently outperforming both ACT and Diffusion Policy. Results demonstrate that holistic history modeling is essential for robust sequential decision-making in complex, context-sensitive robotic tasks.

Technology Category

Application Category

📝 Abstract

Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, limiting their applicability to tasks where historical context is crucial for disambiguating current observations. This limitation hinders performance in long-horizon sequential manipulation tasks where the correct action depends on past events not fully captured by the current state. To address this fundamental challenge, we introduce Mamba Temporal Imitation Learning (MTIL), a novel approach that leverages the recurrent state dynamics inherent in State Space Models (SSMs), specifically the Mamba architecture. MTIL encodes the entire trajectory history into a compressed hidden state, conditioning action predictions on this comprehensive temporal context alongside current multi-modal observations. Through extensive experiments on simulated benchmarks (ACT dataset tasks, Robomimic, LIBERO) and real-world sequential manipulation tasks specifically designed to probe temporal dependencies, MTIL significantly outperforms state-of-the-art methods like ACT and Diffusion Policy. Our findings affirm the necessity of full temporal context for robust sequential decision-making and validate MTIL as a powerful approach that transcends the inherent limitations of Markovian imitation learning

Problem

Research questions and friction points this paper is trying to address.

Overcoming Markov assumption limits in imitation learning

Enhancing action prediction with full trajectory history

Improving performance in long-horizon sequential manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Mamba architecture for temporal encoding

Encodes full trajectory history into hidden state

Conditions actions on multi-modal temporal context

🔎 Similar Papers

No similar papers found.

Authors to Follow