ProgAgent:A Continual RL Agent with Progress-Aware Rewards

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual challenges of catastrophic forgetting and the high cost of reward annotation in continual reinforcement learning by proposing a method that learns perception-driven, progress-aware rewards from a small number of unlabeled expert demonstration videos. The approach integrates a state-potential-based reward model with adversarial backward-regularization to mitigate distributional shift, and unifies reward learning, PPO, coreset experience replay, and synaptic intelligence within a native differentiable JAX framework to enable efficient and stable lifelong learning. Evaluated on ContinualBench and Meta-World, the method significantly reduces forgetting, accelerates learning, and outperforms existing visual reward and continual learning baselines—even surpassing an idealized perfect-memory agent—while demonstrating strong few-shot skill acquisition capabilities on a real-world robotic platform.

Technology Category

Application Category

📝 Abstract
We present ProgAgent, a continual reinforcement learning (CRL) agent that unifies progress-aware reward learning with a high-throughput, JAX-native system architecture. Lifelong robotic learning grapples with catastrophic forgetting and the high cost of reward specification. ProgAgent tackles these by deriving dense, shaped rewards from unlabeled expert videos through a perceptual model that estimates task progress across initial, current, and goal observations. We theoretically interpret this as a learned state-potential function, delivering robust guidance in line with expert behaviors. To maintain stability amid online exploration - where novel, out-of-distribution states arise - we incorporate an adversarial push-back refinement that regularizes the reward model, curbing overconfident predictions on non-expert trajectories and countering distribution shift. By embedding this reward mechanism into a JIT-compiled loop, ProgAgent supports massively parallel rollouts and fully differentiable updates, rendering a sophisticated unified objective feasible: it merges PPO with coreset replay and synaptic intelligence for an enhanced stability-plasticity balance. Evaluations on ContinualBench and Meta-World benchmarks highlight ProgAgent's advantages: it markedly reduces forgetting, boosts learning speed, and outperforms key baselines in visual reward learning (e.g., Rank2Reward, TCN) and continual learning (e.g., Coreset, SI) - surpassing even an idealized perfect memory agent. Real-robot trials further validate its ability to acquire complex manipulation skills from noisy, few-shot human demonstrations.
Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting
reward specification
continual reinforcement learning
lifelong robotic learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

continual reinforcement learning
progress-aware reward
adversarial regularization
JAX-native architecture
state-potential function
J
Jinzhou Tan
University of California, San Diego
G
Gabriel Adineera
Texas A&M University-Commerce
Jinoh Kim
Jinoh Kim
Texas A&M University, Commerce, 75428 TX, USA
Network telemetry and analyticsNetworked systems and securityBig data computing and analytics