SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
Existing safety evaluation methods for autonomous agents predominantly rely on manually designed tasks, which offer limited coverage and focus solely on final outputs, thereby failing to capture unsafe behaviors that emerge during execution in complex environments. To address this limitation, this work proposes SeClaw, a novel framework that, for the first time, automatically generates safety evaluation tasks from structured risk specifications. By integrating Docker-based containerized testing environments with a trajectory-aware assessment mechanism, SeClaw enables fine-grained, reproducible safety evaluations of agent behavior across multiple risk dimensions—including resources, tasks, environment, and self-conducted actions—throughout the entire execution trajectory. The framework establishes a standardized benchmark encompassing a broad spectrum of safety threats, providing a measurable, diagnosable, and comparable foundation for evaluating LLM-based agents.
📝 Abstract
Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe behavior. We introduce SeClaw, a framework that combines specification-driven security task synthesis with execution-based security evaluation for Autonomous agents. Spec-driven security task synthesis enables scalable and controllable construction of security tasks from structured risk specifications, while SeClaw docker provides a standardized testbed for evaluating agent behavior under diverse safety-risk scenarios. The benchmark covers risks arising from resources, user tasks, environments, and intrinsic agent behaviors, and supports trajectory-aware assessment of unsafe actions beyond final responses. By bridging systematic task synthesis and reproducible security evaluation, SeClaw provides a practical foundation for measuring, diagnosing, and comparing security failures in autonomous LLM agents. The code is available at https://github.com/seclaw-eval/seclaw-eval.
Problem

Research questions and friction points this paper is trying to address.

autonomous agents
security evaluation
LLM safety
task synthesis
execution trajectory
Innovation

Methods, ideas, or system contributions that make the work stand out.

specification-driven synthesis
autonomous agents
security evaluation
trajectory-aware assessment
LLM safety
💼 Related Jobs
Hao Cheng
Hao Cheng
HKBU
RobustnessData quality
Changtao Miao
Changtao Miao
University of Science and Technology of China
AI
T
Tianle Song
Xi’an Jiaotong University
Yin Wu
Yin Wu
Karlsruher Institut für Technologie
Autonomous DrivingADASScenario ExtractionAnomaly Detection
H
He Liu
Ant Digital Technologies, Ant Group
Erjia Xiao
Erjia Xiao
The Hong Kong University of Science and Technology
Machine Learning
J
Junchi Chen
Ant Digital Technologies, Ant Group
X
Xiaoyu Shi
Ant Digital Technologies, Ant Group
Y
Yichi Wang
University of Oxford
J
Jing Yang
City University of Hong Kong
T
Taowen Wang
The Hong Kong University of Science and Technology (Guangzhou)
Jinhao Duan
Jinhao Duan
Postdoc@UNC-Chapel Hill, Ph.D.@Drexel University
AI4ScienceTrustworthy MLGenerative AI
Mengshu Sun
Mengshu Sun
Beijing University of Technology
Deep LearningModel Compression and Acceleration
P
Peiyan Dong
Massachusetts Institute of Technology
Xuan Shen
Xuan Shen
Cornell Tech, Northeastern University
Efficient Deep LearningML SystemsAutoML
Yang Cao
Yang Cao
Institute of Science Tokyo (formerly Tokyo Tech)
Differential PrivacyFederated LearningData EconomyTrustworthy Data Science
Renjing Xu
Renjing Xu
HKUST(GZ)
Brain-inspired ComputingHumanoid Computing
Kaidi Xu
Kaidi Xu
Associate Professor, City University of Hong Kong
AI SecurityUncertainty QuantificationFormal Verification
Jindong Gu
Jindong Gu
Google Research & DeepMind, University of Oxford
Trustworthy AIAI SafetyMultimodal AI
Bo Zhang
Bo Zhang
Alibaba Group
NLP
Jize Zhang
Jize Zhang
Assistant Professor, The Hong Kong University of Science and Technology (HKUST)
Uncertainty QuantificationStorm SurgeSurrogate ModelingCoastal Hazards
Chenhao Lin
Chenhao Lin
Xi'an JiaoTong University
AICVPRML
Philip Torr
Philip Torr
Professor, University of Oxford
Department of Engineering
Chao Shen
Chao Shen
Chair Professor, Xi'an Jiaotong University
AI SecuritySoftware SecurityControl System