SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Existing approaches to agent skill evolution typically assume a fixed set of tools and evaluate skills in isolation, struggling to handle tool-level failures and inter-skill interactions. This work proposes SkillSmith, the first framework that co-evolves skills and tools in a unified proposal space, jointly optimizing both through an ecological utility model inspired by Lotka-Volterra dynamics. SkillSmith incorporates an anti-pattern logging system to avoid conflicts and repeated failures, supports bundled operations such as encapsulation, composition, and decomposition of skill-tool pairs, and leverages execution trace analysis to enhance coordination. Evaluated across three benchmarks—including WildClawBench—and five scales of Qwen3.5 models, SkillSmith significantly outperforms strong baselines, with particularly pronounced gains in high-complexity tasks requiring multi-skill collaboration.

📝 Abstract

Recent self-evolving agents have shown that skills can be discovered, refined, and accumulated through execution. However, existing skill-evolution frameworks typically assume a fixed tool layer and evaluate each skill independently, limiting their ability to repair tool-level failures or reason about interactions among skills. We propose SkillSmith, a synergy-aware skill-tool co-evolution framework. SkillSmith introduces a unified proposal space in which reflection produces atomic bundles that jointly modify skills and tools, allowing tools to be wrapped, edited, composed, split, or retired when skill evolution identifies a reusable capability gap. To guide this joint search, SkillSmith maintains an ecological utility model inspired by Lotka-Volterra dynamics, where an interaction matrix estimated from execution traces captures pairwise complementarity and conflict among skills and provides pressure signals for retrieval, mutation prioritization, and retirement. Furthermore, SkillSmith records anti-patterns, including failure signatures, causal attributions, and remedies, to accelerate diagnosis and veto proposals that repeat known mistakes. Experiments on three benchmarks, including WildClawBench, and five Qwen3.5 model scales show that SkillSmith consistently outperforms strong baselines, with gains that amplify as task complexity and multi-skill co-activation increase.

Problem

Research questions and friction points this paper is trying to address.

skill evolution

tool layer

skill interaction

self-improving agents

capability gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill-tool co-evolution

unified proposal space

ecological utility model