π€ AI Summary
This work addresses the challenges of low sample efficiency and poor cross-task generalization in multitask reinforcement learning by proposing RepMT-SAC, a novel framework that integrates spectral MDP decomposition with a structured value function design. The approach decouples the value function into a task-agnostic core representation and lightweight task-specific modules, enabling efficient knowledge sharing and transfer across tasks. It supports zero-shot transfer to in-distribution tasks and rapid few-shot adaptation to out-of-distribution tasks. Empirical evaluation on quadrotor trajectory tracking demonstrates up to a 30% performance improvement over baseline methods, confirming the frameworkβs effectiveness and robustness.
π Abstract
Reinforcement learning has achieved remarkable success in learning complex control policies, yet its applicability remains limited due to sample inefficiency and poor generalization across tasks. In this work, we propose RepMT-SAC, a framework for multi-task RL that enables efficient knowledge sharing and robust transfer to new tasks. RepMT-SAC uses spectral MDP decomposition to capture transferable dynamics, structuring the value function into a task-agnostic core with a minimal task-specific adjustment. This design allows for strong zero-shot performance on in-distribution tasks and rapid few-shot adaptation to out-of-distribution tasks. We evaluate RepMT-SAC on quadcopter trajectory-following tasks across in-distribution and out-of-distribution contexts, demonstrating that it outperforms baselines by up to 30%.