Shield-Loco: Shielding Locomotion Policies with Predictive Safety Filtering

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge that reinforcement learning policies often fail to generalize to unseen safety constraints, posing risks in real-world dynamic environments. To mitigate this, the authors propose a predictive safety filter that post-processes the policy’s contact-point outputs: upon detecting potential collisions, it asynchronously searches for safe contact sequences by integrating a full physics model with a sampling-based optimizer, guided by a learned value function to preserve long-term reward. The approach innovatively combines geometric projection, momentum-augmented updates, and replica exchange mechanisms to effectively handle safety-critical planning in discontinuous contact spaces. Experiments demonstrate that, in densely cluttered environments, a quadrupedal robot achieves significantly reduced safety violations—both in simulation and on physical hardware—while maintaining locomotion performance closely aligned with the original policy.

📝 Abstract

Reinforcement learning (RL) policies enable dynamic legged locomotion but lack mechanisms to avoid violations of safety constraints that are absent during training. Large-scale offline safe learning is impractical for covering all edge cases. Existing safety frameworks either rely on reduced-order models that cannot reason about whole-body behaviors or require conservative recovery controllers that degrade task performance. We propose a predictive safety filter that post-hoc filters the nominal contact locations fed to the RL policy. When a collision is predicted, a sampling-based optimizer asynchronously searches for safer contact sequences using a full-physics model, while a learned value function bootstraps long-horizon returns. Our three algorithmic components (geometric projection of sampled contacts, momentum-augmented updates, and replica-exchange) make the optimization tractable in a discontinuous contact landscape. We validate the filter on a quadruped robot in dense, cluttered environments, both in simulation and in the real world, showing substantial reductions in safety violations with minimal deviation from the nominal input.

Problem

Research questions and friction points this paper is trying to address.

safety constraints

legged locomotion

reinforcement learning

collision avoidance

whole-body dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive safety filtering

reinforcement learning

legged locomotion