🤖 AI Summary
This work addresses the failure of policy transfer in dual-Ackermann steering robots caused by dynamics discrepancies between simulation and reality, as well as actuation uncertainty. To mitigate these issues, the authors extend ManeuverNet to end-to-end pose control and introduce a novel “sim-to-sim-to-real” transfer framework. This approach models actuation effects observed in Gazebo within PyBullet, enabling a multi-simulator joint training mechanism. The proposed method achieves, for the first time, robust pose control for dual-Ackermann systems under actuation uncertainty. Policies trained with SAC and CrossQ attain a 92% success rate in Gazebo (69% under strict evaluation criteria) and can be deployed directly on real robots without fine-tuning, substantially narrowing the performance gap between simulation and reality.
📝 Abstract
Robust deployment of deep reinforcement learning (DRL) policies on real robots remains challenging due to discrepancies between simulation and real-world dynamics. We address this issue in the context of maneuvering with double-Ackermann-steering mobile robots, which introduce additional constraints due to their non-holonomic nature. Building upon the DRL framework ManeuverNet, we extend its objective from position control to full pose control, resulting in a more challenging task. We further investigate the impact of actuation-related uncertainties on policy transfer. The use of simplified actuation models during training of the extended policy can lead to poor generalization, shown by a success rate drop from 100% in PyBullet to 25% in Gazebo under stricter evaluation conditions. To address this limitation, we adopt a sim-to-sim-to-real approach, where actuation effects observed in Gazebo are incorporated into the PyBullet training environment. Using multi-environment DRL with SAC and CrossQ, we learn policies that remain robust despite modeling inaccuracies. This approach can significantly reduce the performance gap across simulators, achieving up to 92% success rate in Gazebo and maintaining 69% under stricter thresholds, with successful transfer to a real robot without additional tuning.