Causal Information Prioritization for Efficient Reinforcement Learning

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In reinforcement learning, inefficient exploration arises from neglecting causal relationships among states, actions, and rewards. To address this, we propose the first reward-guided causal-enhanced RL framework: it models causal structure via a decomposed MDP, introduces a causal information prioritization mechanism, integrates counterfactual data augmentation with a causal information bottleneck constraint, and formulates a reward-conditioned causal empowerment maximization objective. The method enables interpretable and efficient causal feature selection and utilization in a model-free setting. Evaluated across 39 continuous-control tasks spanning five benchmark domains—including pixel-based observations and sparse-reward settings—it significantly outperforms state-of-the-art methods, achieving an average 37% improvement in sample efficiency.

Technology Category

Application Category

📝 Abstract
Current Reinforcement Learning (RL) methods often suffer from sample-inefficiency, resulting from blind exploration strategies that neglect causal relationships among states, actions, and rewards. Although recent causal approaches aim to address this problem, they lack grounded modeling of reward-guided causal understanding of states and actions for goal-orientation, thus impairing learning efficiency. To tackle this issue, we propose a novel method named Causal Information Prioritization (CIP) that improves sample efficiency by leveraging factored MDPs to infer causal relationships between different dimensions of states and actions with respect to rewards, enabling the prioritization of causal information. Specifically, CIP identifies and leverages causal relationships between states and rewards to execute counterfactual data augmentation to prioritize high-impact state features under the causal understanding of the environments. Moreover, CIP integrates a causality-aware empowerment learning objective, which significantly enhances the agent's execution of reward-guided actions for more efficient exploration in complex environments. To fully assess the effectiveness of CIP, we conduct extensive experiments across 39 tasks in 5 diverse continuous control environments, encompassing both locomotion and manipulation skills learning with pixel-based and sparse reward settings. Experimental results demonstrate that CIP consistently outperforms existing RL methods across a wide range of scenarios.
Problem

Research questions and friction points this paper is trying to address.

Improve sample efficiency in RL
Model causal relationships in states
Enhance reward-guided action execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages factored MDPs for causal inference
Executes counterfactual data augmentation strategically
Integrates causality-aware empowerment learning objective
🔎 Similar Papers
No similar papers found.
Hongye Cao
Hongye Cao
Chang'an University
Remote sensing
F
Fan Feng
University of California, San Diego; MBZUAI
T
Tianpei Yang
National Key Laboratory for Novel Software Technology, Nanjing University; School of Intelligence Science and Technology, Nanjing University
Jing Huo
Jing Huo
Nanjing University
Machine LearningComputer Vision
Y
Yang Gao
National Key Laboratory for Novel Software Technology, Nanjing University; School of Intelligence Science and Technology, Nanjing University