Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in reinforcement learning for autonomous motorcycle racing—stemming from the intricate interplay of balance control, lean-angle regulation, and high dynamic responsiveness—by proposing a novel approach that integrates Soft Actor-Critic with self-paced curriculum learning. The method dynamically generates a sequence of tasks from easy to difficult within a physically accurate simulator, eliminating the need for manually designed curricula by automatically adapting to the agent’s evolving capabilities. It further introduces an innovative state representation incorporating lean-angle history and global track features derived from track waypoints. Experimental results demonstrate significant improvements over baseline methods across multiple tracks and motorcycle models, achieving enhanced training efficiency, faster lap times, and greater driving stability. This study establishes the first reinforcement learning–based benchmark for autonomous motorcycle racing.
📝 Abstract
Autonomous Racing has seen remarkable progress through deep Reinforcement Learning (RL), primarily for four-wheeled vehicles. However, motorbikes introduce substantially greater complexity due to the need to manage balance and lean angle, in addition to more reactive steering and throttle control, and a smaller weight. In this work, we present a framework for training an autonomous agent to race a superbike in VRider SBK, a physics-accurate Unity-based motorbike simulator. Our approach integrates Soft Actor-Critic (SAC) with Self-Paced curriculum Deep reinforcement Learning (SPDL), which dynamically generates progressively more challenging tasks based on the agent's performance, without requiring manual curriculum design. The agent's state space comprises proprioceptive features extended with lean-angle history, along with global track features via course points. The reward signal is shaped to encourage progress along the track while penalizing instability-inducing behaviors specific to two-wheeled dynamics. Preliminary experimental results demonstrate that SPDL outperforms SAC alone in training efficiency, lap time, and driving stability across multiple tracks and motorbike models, establishing a first baseline for RL-based autonomous motorbike racing.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Racing
Motorbike Control
Reinforcement Learning
Balance and Lean Angle
Two-wheeled Dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Paced Curriculum Learning
Soft Actor-Critic
Autonomous Motorbike Racing
Two-wheeled Dynamics
Reinforcement Learning
🔎 Similar Papers
No similar papers found.