SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of efficiently and accurately selecting skills from large-scale skill repositories for large language model (LLM) agents. To this end, the authors propose a self-evolving, typed directed skill graph that explicitly models dependency, conflict, specialization, and redundancy relationships among skills. The framework provides a structured retrieval interface enabling dynamic querying during inference—supporting adjacency traversal, conflict detection, and vector-based matching—and incorporates a propose-and-commit mechanism to facilitate cross-task knowledge accumulation and online graph editing. Evaluated on ALFWorld and SkillsBench, the approach achieves a success rate of 67.1% and a reward of 27.3%, respectively, with recall on SkillsBench improving from 65.5% to 78.2%. These results significantly outperform the strongest existing baselines while maintaining robust ranking performance under skill repository expansion.

📝 Abstract

As LLM agents adopt large skill libraries, selecting the right subset becomes a structural problem rather than a similarity-matching one: skills depend on, conflict with, specialize, or duplicate one another, a structure invisible to both full enumeration and embedding similarity. We present SkillDAG, which models inter-skill relationships as a typed directed graph and exposes it to an LLM agent as an inference-time, agent-callable structural retrieval interface, queried and evolved during execution rather than baked into a fixed retrieval pipeline: each search returns vector matches, typed-edge neighbors, and conflict signals, and a propose-then-commit protocol lets the agent register execution-backed edges so the graph accumulates structure across episodes. On ALFWorld and SkillsBench with MiniMax-M2.7, SkillDAG reaches 67.1% success and 27.3% reward, exceeding the strongest reported Graph-of-Skills baseline by +12.8 and +8.6 points; the advantage ports to gpt-5.2-codex, and intrinsic SkillsBench Ret@K rises from 65.5 to 78.2 under matched queries. These gains trace to isolable mechanisms: candidate ranking that stays robust as the pool grows 10x where a fixed seeding-diffusion pipeline degrades, and set-monotone online edits that enlarge ground-truth recall without evicting prior hits.

Problem

Research questions and friction points this paper is trying to address.

LLM skill selection

skill relationships

structured retrieval

skill libraries

graph-based reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

typed skill graph

self-evolving structure

structural retrieval