Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

226K/year
πŸ€– AI Summary
Current autonomous AI agents remain vulnerable to internal threats such as prompt injection and memory contamination, while existing defenses are largely confined to platform boundaries and lack the agent’s intrinsic capacity for security-aware reasoning. This work proposes ClawdGo, a framework that enhances an agent’s threat detection and reasoning capabilities during inference without modifying the underlying model, through endogenous security-awareness training. ClawdGo introduces several innovations: a three-layer domain taxonomy (TLDT), an adversarial self-play training mechanism (ASAT), a cross-session memory accumulation architecture (CSMA), and a formalized security-awareness calibration protocol (SACP), integrated with a three-role self-play scheme, weakest-first curriculum scheduling, and axiom crystallization. Experimental results show that after 16 training rounds, TLDT scores improved from 80.9 to 96.9 across 11 of 12 dimensions, with CSMA preserving all gains; ablation under cold-start conditions recovered only 2.4 points, leaving a 13.6-point performance gap.

Technology Category

Application Category

πŸ“ Abstract
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving the agent's own threat judgement entirely untrained. We present ClawdGo, a framework for endogenous security awareness training: we teach the agent to recognise and reason about threats from the inside, at inference time, with no model modification. Four contributions are introduced: TLDT (Three-Layer Domain Taxonomy) organises 12 trainable dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers; ASAT (Autonomous Security Awareness Training) is a self-play loop where the agent alternates attacker, defender, and evaluator roles under weakest-first curriculum scheduling; CSMA (Cross-Session Memory Accumulation) compounds skill gains via a four-layer persistent memory architecture and Axiom Crystallisation Promotion (ACP); and SACP (Security Awareness Calibration Problem) formalises the precision-recall tradeoff introduced by endogenous training. Live experiments show weakest-first ASAT raises average TLDT score from 80.9 to 96.9 over 16 sessions, outperforming uniform-random scheduling by 6.5 points and covering 11 of 12 dimensions. CSMA retains the full gain across sessions; cold-start ablation recovers only 2.4 points, leaving a 13.6-point gap. E-mode generates 32 TLDT-conformant scenarios covering all 12 dimensions. SACP is observed when a heavily trained agent classifies a legitimate capability assessment as prompt injection (30/160).
Problem

Research questions and friction points this paper is trying to address.

autonomous AI agents
prompt injection
memory poisoning
social engineering
supply-chain attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Endogenous Security Awareness
Autonomous AI Agents
Self-Play Training
Persistent Memory Architecture
Security Calibration
πŸ”Ž Similar Papers
πŸ’Ό Related Jobs
Jiaqi Li
Jiaqi Li
Unknown affiliation
Machine LearningDeep Learning
Y
Yang Zhao
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
B
Bin Sun
Network Management Center, China Mobile Group Liaoning Company Limited, Liaoning, China
Y
Yang Yu
Tencent Security Xuanwu Lab, Haidian District, Beijing, China
J
Jian Chang
China Unicom Online Information Technology Co., Ltd., Beijing, China
L
Lidong Zhai
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China