TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Imitation learning policies are often sensitive to initial conditions and suffer from irreducible approximation errors, rendering them vulnerable to out-of-distribution perturbations and lacking formal safety guarantees. To address these limitations, this work proposes TAIL-Safe, a task-agnostic safety monitoring framework that learns a Lipschitz-continuous Q-function grounded in three criteria—visibility, identifiability, and graspability—to define an empirical control-invariant set. Inspired by Nagumo’s theorem, the method further introduces a gradient-ascent recovery mechanism that actively steers the policy back into the safe set whenever deviations occur. Integrated with a high-fidelity Gaussian Splatting digital twin, TAIL-Safe significantly enhances robustness on a Franka Emika robot: a flow-matching policy that previously failed frequently under perturbations achieves stable and consistent success when guided by TAIL-Safe.

📝 Abstract

Recent imitation learning (IL) algorithms such as flow-matching and diffusion policies demonstrate remarkable performance in learning complex manipulation tasks. However, these policies often fail even when operating within their training distribution due to extreme sensitivity to initial conditions and irreducible approximation errors that lead to compounding drift. This makes it unsafe to deploy IL policies in the field where out-of-distribution scenarios are prevalent. A prerequisite for safe deployment is enabling the policy to determine whether it can execute a task the way it was learned from demonstrations. This paper presents TAIL-Safe, a principled approach to identify, for a trained IL policy, a safe set from where the policy empirically succeeds in completing the learned task. We propose a Lipschitz-continuous Q-value function that maps state-action pairs to a long-term safety score based on three short-term task-agnostic criteria: visibility, recognizability, and graspability. The zero-superlevel set of this function characterizes an empirical control invariant set over state-action pairs. When the nominal policy proposes an action outside this set, we apply a recovery mechanism inspired by Nagumo's theorem that uses gradient ascent to the Q-function to steer the policy back to safety. To learn this Q-function, we construct a high-fidelity digital twin using Gaussian Splatting that enables systematic collection of failure data without risk to physical hardware. Experiments with a Franka Emika robot demonstrate that flow-matching policies, which fail under run-time perturbations, achieve consistent task success when guided by the proposed TAIL-Safe.

Problem

Research questions and friction points this paper is trying to address.

Imitation Learning

Safety Monitoring

Out-of-Distribution

Policy Failure

Task Execution Safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

imitation learning safety

task-agnostic monitoring

control invariant set