Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the critical security gap in open agent skill ecosystems, where malicious skills often masquerade under benign descriptions and existing defenses lack a unified benchmark combining semantic analysis with runtime verification. To bridge this gap, we propose SkillVetBench—the first end-to-end security evaluation benchmark for agent skills. It operates in two stages: first, it employs natural language semantic analysis to detect latent malicious intent; second, it executes suspicious skills within an isolated sandbox, monitoring privileged primitives (e.g., exec, write_file) and inter-component interactions to generate auditable execution traces as forensic evidence. Experimental results demonstrate that approaches relying solely on semantic or signature-based detection miss up to 89% of malicious skills, whereas SkillVetBench effectively captures runtime attacks and provides concrete, interpretable evidence for definitive security judgments.

📝 Abstract

Open agent platforms allow community contributors to publish reusable skills that agents can invoke at runtime. This extensibility also creates a supply-chain risk: malicious contributors can hide harmful behavior inside skills that appear benign under superficial inspection. However, existing defenses are hard to evaluate because there is no benchmark that measures both malicious-skill detection and runtime verification. We present SkillVetBench, a two-stage security vetting benchmark for open agentic skill ecosystems. The first stage performs semantic vetting over each skill's natural-language specification to detect hidden malicious intent. The second stage executes flagged skills in an instrumented sandbox to observe runtime behavior and collect auditable evidence. We build a benchmark from confirmed malicious skills in the live OpenClaw ecosystem, including samples from the recent ClawHavoc supplychain campaign. Unlike static-only methods, SkillVetBench verifies detected threats with execution traces. Our experiments show that: (1) semantic-only and signature-based baselines are insufficient, missing up to 89\% of malicious skills whose threats arise from natural-language instructions, multicomponent logic, or cross-component interactions; (2) runtime attacks are concentrated in a small set of high-permission primitives, especially exec, write\_file, install\_skill, and spawn; and (3) SkillVetBench provides case studies in which sandbox execution directly supports malicious verdicts with concrete runtime evidence.

Problem

Research questions and friction points this paper is trying to address.

security risk

open agentic skill ecosystems

malicious skill detection

runtime verification

supply-chain attack

Innovation

Methods, ideas, or system contributions that make the work stand out.

SkillVetBench

agentic skill ecosystems

semantic vetting