ProgAgent:A Continual RL Agent with Progress-Aware Rewards

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the dual challenges of catastrophic forgetting and the high cost of reward annotation in continual reinforcement learning by proposing a method that learns perception-driven, progress-aware rewards from a small number of unlabeled expert demonstration videos. The approach integrates a state-potential-based reward model with adversarial backward-regularization to mitigate distributional shift, and unifies reward learning, PPO, coreset experience replay, and synaptic intelligence within a native differentiable JAX framework to enable efficient and stable lifelong learning. Evaluated on ContinualBench and Meta-World, the method significantly reduces forgetting, accelerates learning, and outperforms existing visual reward and continual learning baselines—even surpassing an idealized perfect-memory agent—while demonstrating strong few-shot skill acquisition capabilities on a real-world robotic platform.

Technology Category

Application Category

📝 Abstract

We present ProgAgent, a continual reinforcement learning (CRL) agent that unifies progress-aware reward learning with a high-throughput, JAX-native system architecture. Lifelong robotic learning grapples with catastrophic forgetting and the high cost of reward specification. ProgAgent tackles these by deriving dense, shaped rewards from unlabeled expert videos through a perceptual model that estimates task progress across initial, current, and goal observations. We theoretically interpret this as a learned state-potential function, delivering robust guidance in line with expert behaviors. To maintain stability amid online exploration - where novel, out-of-distribution states arise - we incorporate an adversarial push-back refinement that regularizes the reward model, curbing overconfident predictions on non-expert trajectories and countering distribution shift. By embedding this reward mechanism into a JIT-compiled loop, ProgAgent supports massively parallel rollouts and fully differentiable updates, rendering a sophisticated unified objective feasible: it merges PPO with coreset replay and synaptic intelligence for an enhanced stability-plasticity balance. Evaluations on ContinualBench and Meta-World benchmarks highlight ProgAgent's advantages: it markedly reduces forgetting, boosts learning speed, and outperforms key baselines in visual reward learning (e.g., Rank2Reward, TCN) and continual learning (e.g., Coreset, SI) - surpassing even an idealized perfect memory agent. Real-robot trials further validate its ability to acquire complex manipulation skills from noisy, few-shot human demonstrations.

Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting

reward specification

continual reinforcement learning

lifelong robotic learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual reinforcement learning

progress-aware reward

adversarial regularization

JAX-native architecture

state-potential function

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics

2024-06-20arXiv.orgCitations: 0

Authors to Follow