Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality trajectories and poor generalization of suboptimal data in offline reinforcement learning, this paper proposes the Counterfactual Reasoning Decision Transformer (CRDT). CRDT is the first approach to integrate counterfactual reasoning into the decision transformer framework without modifying the model architecture: it generates counterfactual experiences by concatenating suboptimal trajectories, enabling both data augmentation and behavior-cloning-style goal-conditioned modeling. Evaluated on Atari and D4RL benchmarks, CRDT consistently outperforms the standard Decision Transformer. Under data-constrained and dynamics-shift settings, it achieves an average performance gain of 23.6%, demonstrating markedly improved few-shot adaptability and cross-scenario trajectory stitching capability. CRDT establishes a novel paradigm for sequential decision-making modeling under low-quality offline data conditions.

Technology Category

Application Category

📝 Abstract
Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in unseen scenarios. Experiments across Atari and D4RL benchmarks, including scenarios with limited data and altered dynamics, demonstrate that CRDT outperforms conventional DT approaches. Additionally, reasoning counterfactually allows the DT agent to obtain stitching abilities, combining suboptimal trajectories, without architectural modifications. These results highlight the potential of counterfactual reasoning to enhance reinforcement learning agents' performance and generalization capabilities.
Problem

Research questions and friction points this paper is trying to address.

DT needs high-quality data but often lacks optimal training data
CRDT improves DT by generating counterfactual experiences for better decisions
CRDT outperforms DT in limited data and altered dynamics scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

CRDT enhances DT with counterfactual reasoning
Generates counterfactual experiences for unseen scenarios
Improves stitching abilities without architectural changes
🔎 Similar Papers
No similar papers found.
M
Minh Hoang Nguyen
Applied AI Institute, Deakin University, Australia
L
Linh Le Pham Van
Applied AI Institute, Deakin University, Australia
Thommen George Karimpanal
Thommen George Karimpanal
Lecturer (Assistant Professor), Deakin University
Reinforcement LearningArtificial IntelligenceHuman AlignmentArtificial Life
S
Sunil Gupta
Applied AI Institute, Deakin University, Australia
H
Hung Le
Applied AI Institute, Deakin University, Australia