🤖 AI Summary
To address the low sample efficiency and insufficient utilization of prior knowledge in robot reinforcement learning (RL), this paper proposes a context-aware knowledge transfer framework. It quantifies task similarity between new environments and pre-trained policies via state-transition dynamics modeling, enabling automatic identification and prioritization of relevant knowledge. A plug-and-play adaptation mechanism ensures compatibility with diverse RL paradigms—including policy gradient, value-based, and Actor-Critic methods—while supporting robust sim-to-real transfer. Evaluated on CarRacing and LunarLander benchmarks, the framework accelerates convergence by 2.3× on average and improves final performance. Real-robot experiments demonstrate efficient policy deployment on complex off-road terrain with minimal fine-tuning. The core contributions are: (i) the first use of system dynamics consistency as a context-aware transfer criterion, and (ii) a unified, generalizable paradigm for adaptive knowledge integration.
📝 Abstract
Using Reinforcement Learning (RL) to learn new robotic tasks from scratch is often inefficient. Leveraging prior knowledge has the potential to significantly enhance learning efficiency, which, however, raises two critical challenges: how to determine the relevancy of existing knowledge and how to adaptively integrate them into learning a new task. In this paper, we propose Context-aware Adaptation for Robot Learning (CARoL), a novel framework to efficiently learn a similar but distinct new task from prior knowledge. CARoL incorporates context awareness by analyzing state transitions in system dynamics to identify similarities between the new task and prior knowledge. It then utilizes these identified similarities to prioritize and adapt specific knowledge pieces for the new task. Additionally, CARoL has a broad applicability spanning policy-based, value-based, and actor-critic RL algorithms. We validate the efficiency and generalizability of CARoL on both simulated robotic platforms and physical ground vehicles. The simulations include CarRacing and LunarLander environments, where CARoL demonstrates faster convergence and higher rewards when learning policies for new tasks. In real-world experiments, we show that CARoL enables a ground vehicle to quickly and efficiently adapt policies learned in simulation to smoothly traverse real-world off-road terrain.