Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the scarcity of high-quality trajectories and poor generalization of suboptimal data in offline reinforcement learning, this paper proposes the Counterfactual Reasoning Decision Transformer (CRDT). CRDT is the first approach to integrate counterfactual reasoning into the decision transformer framework without modifying the model architecture: it generates counterfactual experiences by concatenating suboptimal trajectories, enabling both data augmentation and behavior-cloning-style goal-conditioned modeling. Evaluated on Atari and D4RL benchmarks, CRDT consistently outperforms the standard Decision Transformer. Under data-constrained and dynamics-shift settings, it achieves an average performance gain of 23.6%, demonstrating markedly improved few-shot adaptability and cross-scenario trajectory stitching capability. CRDT establishes a novel paradigm for sequential decision-making modeling under low-quality offline data conditions.

Technology Category

Application Category

📝 Abstract

Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in unseen scenarios. Experiments across Atari and D4RL benchmarks, including scenarios with limited data and altered dynamics, demonstrate that CRDT outperforms conventional DT approaches. Additionally, reasoning counterfactually allows the DT agent to obtain stitching abilities, combining suboptimal trajectories, without architectural modifications. These results highlight the potential of counterfactual reasoning to enhance reinforcement learning agents' performance and generalization capabilities.

Problem

Research questions and friction points this paper is trying to address.

DT needs high-quality data but often lacks optimal training data

CRDT improves DT by generating counterfactual experiences for better decisions

CRDT outperforms DT in limited data and altered dynamics scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

CRDT enhances DT with counterfactual reasoning

Generates counterfactual experiences for unseen scenarios

Improves stitching abilities without architectural changes

🔎 Similar Papers

No similar papers found.

Authors to Follow