Skill Reuse as Compression in Agentic RL

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the tendency of large language model agents in reinforcement learning to acquire brittle, task-specific shortcut strategies that hinder generalization. It formalizes skill reuse as a trajectory compression problem and introduces ReuseRL, a framework grounded in the Minimum Description Length (MDL) principle. ReuseRL extracts a reusable skill dictionary from successful trajectories and incorporates a segmentation-based compression cost into the reinforcement learning objective to promote concise, transferable behavioral structures. Theoretical analysis yields a corresponding PAC-Bayes generalization bound. Empirical results demonstrate that ReuseRL significantly outperforms vanilla GRPO and strong baselines across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise benchmarks, achieving higher success rates both in-distribution and out-of-distribution.
📝 Abstract
Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce ReuseRL, which grounds agentic RL in the Minimum Description Length (MDL) principle. ReuseRL extracts a shared skill dictionary from successful trajectories and augments the RL objective with a segmentation cost, explicitly penalizing idiosyncratic behaviors that encode poorly. We prove a PAC-Bayes generalization bound for this compression penalty. Across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL improves in- and out-of-distribution success over vanilla GRPO and strong round-length baselines.
Problem

Research questions and friction points this paper is trying to address.

skill reuse
reinforcement learning
generalization
compression
agentic RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skill Reuse
Minimum Description Length
Agentic Reinforcement Learning
Trajectory Compression
PAC-Bayes Generalization