π€ AI Summary
To address trajectory drift in bipedal robots caused by model mismatch in reinforcement learning (RL) and model-based control, this paper proposes an embedded differentiable system identification framework. Methodologically, system identification is seamlessly integrated into the RL training loop using the MuJoCo-XLA differentiable simulator; it performs end-to-end joint optimization of rigid-body parameters (mass, inertia) and a neural-network-parameterized nonlinear friction model, relying solely on joint position, velocity, and control input dataβwithout torque sensors. The key contribution is the first demonstration of differentiable system identification co-optimized with RL without physical force/torque measurements. Experiments show that the approach significantly suppresses trajectory drift, improves gait tracking accuracy, and enhances walking stability. This work establishes a new paradigm for data-driven, high-fidelity dynamical modeling in legged robotics.
π Abstract
Accurate system identification is crucial for reducing trajectory drift in bipedal locomotion, particularly in reinforcement learning and model-based control. In this paper, we present a novel control framework that integrates system identification into the reinforcement learning training loop using differentiable simulation. Unlike traditional approaches that rely on direct torque measurements, our method estimates system parameters using only trajectory data (positions, velocities) and control inputs. We leverage the differentiable simulator MuJoCo-XLA to optimize system parameters, ensuring that simulated robot behavior closely aligns with real-world motion. This framework enables scalable and flexible parameter optimization. Accurate system identification is crucial for reducing trajectory drift in bipedal locomotion, particularly in reinforcement learning and model-based control. In this paper, we present a novel control framework that integrates system identification into the reinforcement learning training loop using differentiable simulation. Unlike traditional approaches that rely on direct torque measurements, our method estimates system parameters using only trajectory data (positions, velocities) and control inputs. We leverage the differentiable simulator MuJoCo-XLA to optimize system parameters, ensuring that simulated robot behavior closely aligns with real-world motion. This framework enables scalable and flexible parameter optimization. It supports fundamental physical properties such as mass and inertia. Additionally, it handles complex system nonlinear behaviors, including advanced friction models, through neural network approximations. Experimental results show that our framework significantly improves trajectory following.