Towards Empowerment Gain through Causal Structure Learning in Model-Based RL

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak interpretability, low controllability, and inefficient exploration in model-based reinforcement learning (MBRL), this paper proposes ECL, a causality-empowered framework that jointly optimizes empowerment maximization—defined as the mutual information between future states and actions—and causal dynamics modeling. ECL introduces an updateable causal structure mechanism to enhance the controllability and task adaptability of learned dynamics models, while supporting plug-and-play integration of arbitrary differentiable causal discovery methods, thereby balancing generalizability and task-specificity. Evaluated on six benchmark environments—including those with pixel observations—ECL substantially outperforms existing causal MBRL approaches: it achieves higher causal discovery accuracy, improves sample efficiency by an average factor of 2.1×, and attains superior asymptotic performance.

Technology Category

Application Category

📝 Abstract
In Model-Based Reinforcement Learning (MBRL), incorporating causal structures into dynamics models provides agents with a structured understanding of the environments, enabling efficient decision. Empowerment as an intrinsic motivation enhances the ability of agents to actively control their environments by maximizing the mutual information between future states and actions. We posit that empowerment coupled with causal understanding can improve controllability, while enhanced empowerment gain can further facilitate causal reasoning in MBRL. To improve learning efficiency and controllability, we propose a novel framework, Empowerment through Causal Learning (ECL), where an agent with the awareness of causal dynamics models achieves empowerment-driven exploration and optimizes its causal structure for task learning. Specifically, ECL operates by first training a causal dynamics model of the environment based on collected data. We then maximize empowerment under the causal structure for exploration, simultaneously using data gathered through exploration to update causal dynamics model to be more controllable than dense dynamics model without causal structure. In downstream task learning, an intrinsic curiosity reward is included to balance the causality, mitigating overfitting. Importantly, ECL is method-agnostic and is capable of integrating various causal discovery methods. We evaluate ECL combined with 3 causal discovery methods across 6 environments including pixel-based tasks, demonstrating its superior performance compared to other causal MBRL methods, in terms of causal discovery, sample efficiency, and asymptotic performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing controllability in MBRL
Integrating causal structures for empowerment
Improving learning efficiency with causal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal dynamics model training
Empowerment-driven exploration optimization
Intrinsic curiosity reward integration
🔎 Similar Papers
No similar papers found.
Hongye Cao
Hongye Cao
Chang'an University
Remote sensing
F
Fan Feng
University of California, San Diego, MBZUAI
Meng Fang
Meng Fang
University of Liverpool
Natural Language ProcessingReinforcement LearningAgentsArtificial intelligence
Shaokang Dong
Shaokang Dong
Honor Device Co., Ltd
Multi-agent RLRLHFLLM Agent
T
Tianpei Yang
National Key Laboratory for Novel Software Technology, Nanjing University, School of Intelligence Science and Technology, Nanjing University
Jing Huo
Jing Huo
Nanjing University
Machine LearningComputer Vision
Y
Yang Gao
National Key Laboratory for Novel Software Technology, Nanjing University, School of Intelligence Science and Technology, Nanjing University