Consistency Trajectory Planning: High-Quality and Efficient Trajectory Optimization for Offline Model-Based Reinforcement Learning

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In offline model-based reinforcement learning, diffusion-based trajectory planning suffers from high computational cost due to iterative multi-step sampling. To address this, this work introduces Consistency Trajectory Optimization (CTO), the first application of consistency models to this paradigm, enabling high-quality single-step trajectory generation. CTO integrates consistency modeling, diffusion-accelerated inference, and a goal-conditioned long-horizon planning framework. Evaluated on the D4RL benchmark, CTO significantly outperforms existing diffusion-based planners: it achieves over 120× faster inference while attaining higher normalized returns. This work is the first to demonstrate the efficacy of consistency models for trajectory planning in offline model-based RL, establishing a new paradigm that simultaneously delivers both efficiency and high performance.

Technology Category

Application Category

📝 Abstract
This paper introduces Consistency Trajectory Planning (CTP), a novel offline model-based reinforcement learning method that leverages the recently proposed Consistency Trajectory Model (CTM) for efficient trajectory optimization. While prior work applying diffusion models to planning has demonstrated strong performance, it often suffers from high computational costs due to iterative sampling procedures. CTP supports fast, single-step trajectory generation without significant degradation in policy quality. We evaluate CTP on the D4RL benchmark and show that it consistently outperforms existing diffusion-based planning methods in long-horizon, goal-conditioned tasks. Notably, CTP achieves higher normalized returns while using significantly fewer denoising steps. In particular, CTP achieves comparable performance with over $120 imes$ speedup in inference time, demonstrating its practicality and effectiveness for high-performance, low-latency offline planning.
Problem

Research questions and friction points this paper is trying to address.

Efficient trajectory optimization for offline reinforcement learning
Reducing computational costs in diffusion-based planning methods
Achieving high performance with low-latency offline planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Consistency Trajectory Model (CTM)
Single-step trajectory generation
Achieves high speedup in inference
🔎 Similar Papers
No similar papers found.
G
Guanquan Wang
Department of Information and Communication Engineering, The University of Tokyo
T
Takuya Hiraoka
NEC Corporation, Tokyo, Japan
Yoshimasa Tsuruoka
Yoshimasa Tsuruoka
The University of Tokyo
Natural Language ProcessingReinforcement LearningArtificial Intelligence for Games