ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the sudden failure of generative robot policies during deployment—often caused by hesitation, deviation, or irreversible actions—for which existing online detection methods either rely on internal policy states or incur substantial computational overhead. We propose ActProbe, a lightweight, action-space-only failure detector that requires only a single forward pass to obtain an action sequence and leverages two signals: temporal consistency error (TCE) and action magnitude (ACM). Integrated within a task-conditioned LSTM-MLP architecture, ActProbe predicts failure probability in real time. Our approach demonstrates for the first time that action sequences alone contain strong precursors to failure, enabling early warnings without access to environmental observations or the policy internals. Experiments show ActProbe generalizes well to unseen tasks, improving the F1–timeliness Pareto hypervolume by 12.7% on average and achieving a 9.0% higher early-detection ROC-AUC, while successfully transferring to real-world grasping and reducing PPO fine-tuning interactions by 2.9×.
📝 Abstract
Generative robot policies fail unpredictably at deployment: they hesitate at critical moments, drift off-task, or commit to unrecoverable actions. Existing online failure detectors either require white-box access to policy internals or add runtime overhead through resampling and observation-side signals. Our empirical analysis shows that emitted action chunks themselves already carry strong predictive signal for impending failures in generative robot policies. Motivated by this observation, we introduce ActProbe, a lightweight, pure action-space detector that uses two compact signals available from a single forward pass: Temporal Consistency Error (TCE) between consecutive action chunks and Action Chunk Magnitude (ACM) of the current chunk. ActProbe maps these signals to per-step failure probabilities with a task-conditioned LSTM-MLP architecture. Across a diverse suite of generative robot policies and benchmarks, ActProbe raises alerts before failures become visually recognizable, improving the accuracy (F1)-timeliness Pareto frontier of failure detection by an average hypervolume gain of +12.7% over both internal- and external-feature baselines, with a +9.0% early-detection ROC-AUC lead on unseen tasks. ActProbe further transfers to deployment, predicting failures on unseen real-robot pick tasks and accelerating RL fine-tuning (PPO) with 2.9x fewer environment interactions.
Problem

Research questions and friction points this paper is trying to address.

generative robot policies
failure detection
action-space probe
online monitoring
deployment reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

ActProbe
failure detection
generative robot policies
action-space probing
Temporal Consistency Error
🔎 Similar Papers
No similar papers found.
B
Bingjia Huang
Institute for AI Industry Research (AIR), Tsinghua University; University of Electronic Science and Technology of China
Xiangyu Li
Xiangyu Li
Institute for AI Industry Research (AIR), Tsinghua University
Machine Learning SystemMobile ComputingLarge Language Models
X
Xiang Wang
Institute for AI Industry Research (AIR), Tsinghua University
L
Liang Mi
Nanjing University
Z
Zixu Hao
Institute for AI Industry Research (AIR), Tsinghua University
Weijun Wang
Weijun Wang
Tsinghua University
LLM Serving SystemEdge AIVideo Analytics System
H
Hao Wu
Nanjing University
K
Kun Li
Institute for AI Industry Research (AIR), Tsinghua University
Yunxin Liu
Yunxin Liu
IEEE Fellow, Guoqiang Professor, Institute for AI Industry Research (AIR), Tsinghua University
Mobile ComputingEdge ComputingAIoTSystemNetworking
T
Ting Cao
Institute for AI Industry Research (AIR), Tsinghua University