MTIL: Encoding Full History with Mamba for Temporal Imitation Learning

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard imitation learning relies on the Markov assumption, limiting its capacity to model long-range temporal dependencies—especially critical in robotic sequential manipulation tasks where disambiguating observations requires rich historical context. To address this, we introduce Mamba—a state space model (SSM)—into imitation learning for the first time, proposing an end-to-end full-trajectory history modeling framework. Our method recursively compresses and leverages the entire observation history—including multimodal inputs—via efficient hidden state propagation, enabling non-Markovian action prediction. Crucially, it abandons restrictive local-state assumptions and explicitly encodes long-horizon temporal dependencies. We evaluate across ACT, RoboMimic, LIBERO, and real-world sequential manipulation benchmarks, consistently outperforming both ACT and Diffusion Policy. Results demonstrate that holistic history modeling is essential for robust sequential decision-making in complex, context-sensitive robotic tasks.

Technology Category

Application Category

📝 Abstract
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, limiting their applicability to tasks where historical context is crucial for disambiguating current observations. This limitation hinders performance in long-horizon sequential manipulation tasks where the correct action depends on past events not fully captured by the current state. To address this fundamental challenge, we introduce Mamba Temporal Imitation Learning (MTIL), a novel approach that leverages the recurrent state dynamics inherent in State Space Models (SSMs), specifically the Mamba architecture. MTIL encodes the entire trajectory history into a compressed hidden state, conditioning action predictions on this comprehensive temporal context alongside current multi-modal observations. Through extensive experiments on simulated benchmarks (ACT dataset tasks, Robomimic, LIBERO) and real-world sequential manipulation tasks specifically designed to probe temporal dependencies, MTIL significantly outperforms state-of-the-art methods like ACT and Diffusion Policy. Our findings affirm the necessity of full temporal context for robust sequential decision-making and validate MTIL as a powerful approach that transcends the inherent limitations of Markovian imitation learning
Problem

Research questions and friction points this paper is trying to address.

Overcoming Markov assumption limits in imitation learning
Enhancing action prediction with full trajectory history
Improving performance in long-horizon sequential manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Mamba architecture for temporal encoding
Encodes full trajectory history into hidden state
Conditions actions on multi-modal temporal context
🔎 Similar Papers
No similar papers found.
Y
Yulin Zhou
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
Y
Yuankai Lin
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
F
Fanzhe Peng
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
J
Jiahui Chen
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
Z
Zhuang Zhou
School of Future Technology, Huazhong University of Science and Technology
K
Kaiji Huang
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
Hua Yang
Hua Yang
Redrock Biometrics
BiometricsMotion TrackingComputer VisionAugmented RealityImage Processing
Zhouping Yin
Zhouping Yin
Professor of Mechanical Science and Engineering, Huazhong University of Science and Technology
Electronical ManufacutringDigital Modelling