Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of reinforcement learning for real-world high-speed autonomous driving without human intervention for resets. The authors develop a continuous autonomous training framework on a 1/10-scale physical vehicle platform, integrating Model Predictive Path Integral (MPPI) control as both a safety baseline and an automatic reset mechanism, and incorporating residual learning. They present the first systematic evaluation of PPO, SAC, and TD-MPC2 under no-reset conditions in both simulation and real-world settings. Results demonstrate that TD-MPC2 is the only algorithm that consistently outperforms the MPPI baseline on the physical platform. Surprisingly, while residual learning improves performance in simulation, it degrades real-world performance, highlighting a significant sim-to-real gap and exposing limitations of current transfer methodologies.

📝 Abstract

This paper presents an empirical study of reset-free reinforcement learning (RL) for real-world agile driving, in which a physical 1/10-scale vehicle learns continuously on a slippery indoor track without manual resets. High-speed driving near the limits of tire friction is particularly challenging for learning-based methods because complex vehicle dynamics, actuation delays, and other unmodeled effects hinder both accurate simulation and direct sim-to-real transfer of learned policies. To enable autonomous training on a physical platform, we employ Model Predictive Path Integral control (MPPI) as both the reset policy and the base policy for residual learning, and systematically compare three representative RL algorithms, i.e., PPO, SAC, and TD-MPC2, with and without residual learning in simulation and real-world experiments. Our results reveal a clear gap between simulation and real-world: SAC with residual learning achieves the highest returns in simulation, yet only TD-MPC2 consistently outperforms the MPPI baseline on the physical platform. Moreover, residual learning, while clearly beneficial in simulation, fails to transfer its advantage to the real world and can even degrade performance. These findings reveal that reset-free RL in the real world poses unique challenges absent from simulation, calling for further algorithmic development tailored to training in the wild.

Problem

Research questions and friction points this paper is trying to address.

reset-free reinforcement learning

real-world agile driving

sim-to-real gap

residual learning

physical platform training

Innovation

Methods, ideas, or system contributions that make the work stand out.

reset-free reinforcement learning

real-world agile driving

residual learning

TD-MPC2

sim-to-real gap

🔎 Similar Papers

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention

2024-05-22arXiv.orgCitations: 0

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

2024-04-122024 IEEE Intelligent Vehicles Symposium (IV)Citations: 8

Authors to Follow