🤖 AI Summary
This work addresses the challenges of controlling autonomous bicycles, which are underactuated and highly nonlinear systems, where traditional control methods suffer from sensitivity to model inaccuracies and poor real-world adaptability. The study proposes the first end-to-end deep reinforcement learning Sim-to-Real framework tailored for autonomous bicycles. Leveraging a high-fidelity simulation environment built on NVIDIA Isaac Sim, the approach trains a neural controller using the Proximal Policy Optimization (PPO) algorithm, enhanced by a custom composite reward function and systematic domain randomization to enable robust policy transfer without requiring an explicit dynamics model. Experimental results demonstrate a 99.90% success rate in balancing within simulation, with steering and velocity tracking errors as low as 1.15° and 0.18 m/s, respectively. The trained policy was successfully deployed on a physical platform, validating both its effectiveness and sim-to-real transfer capability.
📝 Abstract
Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics, however, conventional control strategies often struggle with their underactuated nonlinear dynamics, suffering from sensitivity to model mismatches and limited adaptability to real-world uncertainties. To address this, this paper presents CycleRL, the first sim-to-real deep reinforcement learning framework designed for robust autonomous bicycle control. Our approach trains an end-to-end neural control policy within the high-fidelity NVIDIA Isaac Sim environment, leveraging Proximal Policy Optimization (PPO) to circumvent the need for an explicit dynamics model. The framework features a composite reward function tailored for concurrent balance maintenance, velocity tracking, and steering control. Crucially, systematic domain randomization is employed to bridge the simulation-to-reality gap and facilitate direct transfer. In simulation, CycleRL achieves considerable performance, including a 99.90% balance success rate, a low steering tracking error of 1.15°, and a velocity tracking error of 0.18 m/s. These quantitative results, coupled with successful hardware transfer, validate DRL as an effective paradigm for autonomous bicycle control, offering superior adaptability over traditional methods. Video demonstrations are available at https://anony6f05.github.io/CycleRL/.