ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning suffers from hindered reward propagation, biased value estimation, and degraded policy performance due to suboptimal and fragmented trajectories in static datasets. Existing generative trajectory stitching methods are limited by behavioral policy support constraints or violate dynamical consistency. To address this, we propose a dynamically guided adaptive trajectory stitching framework. Our core contributions are: (i) modeling state-wise reachability via temporal distance representation; and (ii) integrating a dynamics-consistent rollout deviation feedback mechanism to adaptively plan and generate novel yet physically feasible connecting action sequences. Without environment interaction, our method enhances dataset quality, significantly improving policy learning stability and generalization. On D4RL and OGBench benchmarks, it consistently outperforms state-of-the-art offline RL approaches, achieving superior and robust policy performance.

Technology Category

Application Category

📝 Abstract
Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While trajectory stitching via generative models offers a promising solution, existing augmentation methods frequently produce trajectories that are either confined to the support of the behavior policy or violate the underlying dynamics, thereby limiting their effectiveness for policy improvement. We propose ASTRO, a data augmentation framework that generates distributionally novel and dynamics-consistent trajectories for offline RL. ASTRO first learns a temporal-distance representation to identify distinct and reachable stitch targets. We then employ a dynamics-guided stitch planner that adaptively generates connecting action sequences via Rollout Deviation Feedback, defined as the gap between target state sequence and the actual arrived state sequence by executing predicted actions, to improve trajectory stitching's feasibility and reachability. This approach facilitates effective augmentation through stitching and ultimately enhances policy learning. ASTRO outperforms prior offline RL augmentation methods across various algorithms, achieving notable performance gain on the challenging OGBench suite and demonstrating consistent improvements on standard offline RL benchmarks such as D4RL.
Problem

Research questions and friction points this paper is trying to address.

Addresses suboptimal fragmented trajectories in offline reinforcement learning
Generates distributionally novel dynamics-consistent trajectories for augmentation
Improves trajectory stitching feasibility via adaptive dynamics-guided planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates dynamics-consistent trajectories via adaptive stitching
Learns temporal-distance representation to identify reachable targets
Employs rollout deviation feedback to improve trajectory feasibility
🔎 Similar Papers
No similar papers found.
H
Hang Yu
Tongji University
D
Di Zhang
Tongji University
Qiwei Du
Qiwei Du
University at Buffalo
Neuro-symbolic AIRoboticsPlanningComputer Vision
Y
Yanping Zhao
Tongji University
H
Hai Zhang
Tongji University
G
Guang Chen
Tongji University
E
Eduardo E. Veas
Graz University of Technology
Junqiao Zhao
Junqiao Zhao
Department of Computer science and technology, Tongji University
SLAMLocalizationReinforcement LearningAutonomous Driving