🤖 AI Summary
This work addresses the model-agnostic nature of existing skill libraries, which overlooks the substantial capability differences among large language models and consequently leads to highly model-dependent skill effectiveness. To resolve this issue, the authors propose the MASA framework, which achieves adaptive alignment between skills and models through a two-stage process: first, it optimizes skills via hierarchical skill evolution—integrating hill-climbing with UCB tree search—guided by environmental feedback and model capability profiling; second, it trains a lightweight model-conditional rewriter to generalize the optimized skills to new tasks. Notably, MASA requires no modification of model weights, is the first to explicitly identify and mitigate model dependency in skill effectiveness, and enables zero-shot transfer. Experiments across three interactive environments and four mainstream models demonstrate that MASA consistently outperforms strong baselines by up to 25.8 points on average while achieving superior performance over larger teacher models at lower inference cost.
📝 Abstract
LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.