🤖 AI Summary
This work addresses the sudden failure of generative robot policies during deployment—often caused by hesitation, deviation, or irreversible actions—for which existing online detection methods either rely on internal policy states or incur substantial computational overhead. We propose ActProbe, a lightweight, action-space-only failure detector that requires only a single forward pass to obtain an action sequence and leverages two signals: temporal consistency error (TCE) and action magnitude (ACM). Integrated within a task-conditioned LSTM-MLP architecture, ActProbe predicts failure probability in real time. Our approach demonstrates for the first time that action sequences alone contain strong precursors to failure, enabling early warnings without access to environmental observations or the policy internals. Experiments show ActProbe generalizes well to unseen tasks, improving the F1–timeliness Pareto hypervolume by 12.7% on average and achieving a 9.0% higher early-detection ROC-AUC, while successfully transferring to real-world grasping and reducing PPO fine-tuning interactions by 2.9×.
📝 Abstract
Generative robot policies fail unpredictably at deployment: they hesitate at critical moments, drift off-task, or commit to unrecoverable actions. Existing online failure detectors either require white-box access to policy internals or add runtime overhead through resampling and observation-side signals. Our empirical analysis shows that emitted action chunks themselves already carry strong predictive signal for impending failures in generative robot policies. Motivated by this observation, we introduce ActProbe, a lightweight, pure action-space detector that uses two compact signals available from a single forward pass: Temporal Consistency Error (TCE) between consecutive action chunks and Action Chunk Magnitude (ACM) of the current chunk. ActProbe maps these signals to per-step failure probabilities with a task-conditioned LSTM-MLP architecture. Across a diverse suite of generative robot policies and benchmarks, ActProbe raises alerts before failures become visually recognizable, improving the accuracy (F1)-timeliness Pareto frontier of failure detection by an average hypervolume gain of +12.7% over both internal- and external-feature baselines, with a +9.0% early-detection ROC-AUC lead on unseen tasks. ActProbe further transfers to deployment, predicting failures on unseen real-robot pick tasks and accelerating RL fine-tuning (PPO) with 2.9x fewer environment interactions.