SkillPager: Query-Adaptive Intra-Skill Navigation via Semantic Node Retrieval

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of contextual redundancy and dilution of critical information when skill-based LLM agents utilize lengthy procedural documents via full-document prompting. To mitigate this, the authors propose SkillPager, a novel framework that introduces typed semantic granularity into skill document retrieval. SkillPager first parses Markdown documentation offline into structured semantic nodes and then employs query-adaptive Maximal Marginal Relevance (MMR) retrieval online to select the minimal yet execution-sufficient context. Evaluated on a benchmark comprising 395 skills and 1,975 queries, SkillPager achieves 78.89% contextual adequacy—only 3.34% lower than the full-document baseline—while reducing prompt tokens by 47.04%. It also outperforms the strongest graph-based baseline by 12.16%, demonstrating the efficacy and necessity of structured semantic nodes for skill-oriented agents.

📝 Abstract

Skill-based LLM agents increasingly rely on long procedural documents, but full-document prompting wastes tokens and dilutes information critical to execution. We study this setting as intra-skill retrieval, where the goal is to select a minimal, execution-sufficient context from a known skill document given a query. We present SkillPager, a two-stage framework that parses each Markdown skill into typed semantic nodes offline and leverages Maximal Marginal Relevance (MMR) to perform global, query-conditioned node selection online. On a benchmark of 395 skills and 1,975 queries, SkillPager achieves 78.89% LLM-judged context sufficiency, compared to 82.23% for the exhaustive full-document baseline, while reducing prompt tokens by 47.04%. A granularity ablation shows that applying the same retrieval algorithm to raw fixed-length chunks reaches a comparable 81.77% sufficiency but increases token cost by 28.81%, demonstrating that efficiency gains are driven by typed semantic granularity rather than the retrieval algorithm alone. Among graph-based baselines, SkillPager outperforms the strongest baseline by a margin of 12.16%. Further ablations show that supporting content is most effective when retained in the candidate pool and selected adaptively rather than removed by static heuristics. These results identify typed intra-document retrieval as a distinct access problem for skill-based agents.

Problem

Research questions and friction points this paper is trying to address.

intra-skill retrieval

semantic node retrieval

skill-based agents

context sufficiency

typed granularity

Innovation

Methods, ideas, or system contributions that make the work stand out.

intra-skill retrieval

semantic node parsing

typed granularity