UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing human-in-the-loop reinforcement learning (HiL-RL) methods rely heavily on frequent human interventions to correct inefficient exploration, resulting in high costs and limited scalability. This work proposes UniIntervene, a novel framework that shifts the burden of intervention from humans to the agent itself. By leveraging future-conditioned action-value estimation and a temporal value-risk critic, the agent autonomously detects policy stagnation, retrieves high-value targets from an intervention memory bank, and generates corrective actions via a goal-conditioned recovery policy. This approach marks a paradigm shift from passive error correction to proactive, value-aware recovery. Evaluated across diverse real-world manipulation tasks, UniIntervene achieves an average 8.6% improvement in success rate and reduces human intervention by 57% compared to state-of-the-art HiL-RL methods.
📝 Abstract
Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL-RL frameworks remain intervention-intensive, relying on frequent human corrections to redirect the policy out of unproductive exploration, which incurs high labor cost and limits real-world scalability. To address this, we propose UniIntervene, an agentic intervention model that detects unproductive exploration and autonomously recovers the policy toward high-value states, taking over the bulk of interventions from human operators. Specifically, UniIntervene first performs future-conditioned action-value estimation, predicting the latent consequence of the current action and evaluating its induced value, which provides a more stable progress signal. Building on this, a temporal value-risk critic aggregates recent value dynamics and triggers intervention when the estimated value exhibits sustained stagnation or degradation. When intervention is required, UniIntervene retrieves a high-value recovery target from a memory of past intervention episodes and produces executable corrective actions through a goal-conditioned recovery policy. In this way, UniIntervene turns intervention from passive human correction into a value-aware recovery process for efficient real-world RL. Extensive experiments on diverse real-world manipulation tasks demonstrate that UniIntervene improves the average success rate by 8.6% while reducing human interventions by 57% relative to state-of-the-art HiL-RL baselines.
Problem

Research questions and friction points this paper is trying to address.

Human-in-the-loop reinforcement learning
intervention-intensive
real-world robotic manipulation
unproductive exploration
human guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic intervention
future-conditioned action-value estimation
temporal value-risk critic
goal-conditioned recovery policy
human-in-the-loop reinforcement learning
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Haoyuan Deng
Haoyuan Deng
Nanyang Technological University
RoboticsImitation LearningReinforcement Learning
Y
Yitong Gao
Nanyang Technological University
Y
Yudong Lin
Nanyang Technological University
H
Haichao Liu
Nanyang Technological University
Z
Zhenyu Wu
Beijing University of Posts and Telecommunications
Ziwei Wang
Ziwei Wang
School of Electrical and Electronic Engineering, Nanyang Technological University
embodied AIroboticscomputer vision