Shield-Loco: Shielding Locomotion Policies with Predictive Safety Filtering

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that reinforcement learning policies often fail to generalize to unseen safety constraints, posing risks in real-world dynamic environments. To mitigate this, the authors propose a predictive safety filter that post-processes the policy’s contact-point outputs: upon detecting potential collisions, it asynchronously searches for safe contact sequences by integrating a full physics model with a sampling-based optimizer, guided by a learned value function to preserve long-term reward. The approach innovatively combines geometric projection, momentum-augmented updates, and replica exchange mechanisms to effectively handle safety-critical planning in discontinuous contact spaces. Experiments demonstrate that, in densely cluttered environments, a quadrupedal robot achieves significantly reduced safety violations—both in simulation and on physical hardware—while maintaining locomotion performance closely aligned with the original policy.
📝 Abstract
Reinforcement learning (RL) policies enable dynamic legged locomotion but lack mechanisms to avoid violations of safety constraints that are absent during training. Large-scale offline safe learning is impractical for covering all edge cases. Existing safety frameworks either rely on reduced-order models that cannot reason about whole-body behaviors or require conservative recovery controllers that degrade task performance. We propose a predictive safety filter that post-hoc filters the nominal contact locations fed to the RL policy. When a collision is predicted, a sampling-based optimizer asynchronously searches for safer contact sequences using a full-physics model, while a learned value function bootstraps long-horizon returns. Our three algorithmic components (geometric projection of sampled contacts, momentum-augmented updates, and replica-exchange) make the optimization tractable in a discontinuous contact landscape. We validate the filter on a quadruped robot in dense, cluttered environments, both in simulation and in the real world, showing substantial reductions in safety violations with minimal deviation from the nominal input.
Problem

Research questions and friction points this paper is trying to address.

safety constraints
legged locomotion
reinforcement learning
collision avoidance
whole-body dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive safety filtering
reinforcement learning
legged locomotion
full-physics model
contact optimization
Aditya Shirwatkar
Aditya Shirwatkar
PhD Student, IISc Bangalore
RoboticsRobot LearningLegged Locomotion
S
Sebastian Sanokowski
Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Munich, Germany
S
Shishir Kolathaya
Robert Bosch Center for Cyber Physical Systems, Indian Institute of Science, Bangalore, India; Department of Computer Science & Automation, Indian Institute of Science, Bangalore, India
Aaron Johnson
Aaron Johnson
U.S. Naval Research Laboratory
privacysecurityanonymitydistributed systems
Majid Khadiv
Majid Khadiv
Assistant Professor, TUM
Robotics