Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing goal-directed behavior and exploratory actions under uncertainty in real-world robotic environments, this paper proposes a hierarchical deep active inference framework. Methodologically, it introduces a temporal hierarchical world model—integrating variational autoencoders with hierarchical RNNs—and incorporates vector-quantized action abstraction to enable multi-timescale state prediction and efficient action selection. The key contributions are: (i) the first integration of action abstraction with hierarchical active inference, unifying environmental dynamics representation and computational efficiency; and (ii) enabling autonomous switching between goal-directed and exploratory behaviors. Evaluated on real-world robotic object manipulation tasks, the framework achieves significantly higher task success rates and reduces action decision-making overhead by 62%, demonstrating its effectiveness and practicality in complex, uncertain settings.

Technology Category

Application Category

📝 Abstract
Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
Problem

Research questions and friction points this paper is trying to address.

Addresses robot control in uncertain real-world environments
Overcomes limited representation and high computational cost in action selection
Enables goal-directed and exploratory actions with a hierarchical world model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical world model with multi-timescale dynamics
Action compression via vector quantization for efficiency
Abstract world model enables low-cost action selection
🔎 Similar Papers
No similar papers found.
K
Kentaro Fujii
Graduate School of Integrated Design Engineering, Keio University
Shingo Murata
Shingo Murata
Keio University
Cognitive RoboticsRobot LearningComputational PsychiatryActive Inference