$\alpha^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical gap in systematically evaluating the safety, resilience, and trustworthiness of large language model (LLM)-driven drone agents operating in adversarial environments under 6G networks. To this end, we propose α³-SecBench, the first comprehensive security evaluation framework that establishes a large-scale benchmark spanning a seven-layer autonomous architecture—encompassing perception, planning, communication, and LLM reasoning. The framework incorporates 20,000 validated attack scenarios and leverages adversarial task augmentation, cross-layer attack modeling, and automated metric quantification (including safety detection, resilience degradation, and policy compliance) to evaluate 23 mainstream LLMs across 113,475 tasks and 175 threat categories. Results reveal alarmingly low overall scores ranging from 12.9% to 57.1%, highlighting a significant gap between anomaly detection and secure decision-making, thereby filling a crucial void in trustworthy evaluation of autonomous systems under adversarial conditions.

Technology Category

Application Category

📝 Abstract
Autonomous unmanned aerial vehicle (UAV) systems are increasingly deployed in safety-critical, networked environments where they must operate reliably in the presence of malicious adversaries. While recent benchmarks have evaluated large language model (LLM)-based UAV agents in reasoning, navigation, and efficiency, systematic assessment of security, resilience, and trust under adversarial conditions remains largely unexplored, particularly in emerging 6G-enabled settings. We introduce $\alpha^{3}$-SecBench, the first large-scale evaluation suite for assessing the security-aware autonomy of LLM-based UAV agents under realistic adversarial interference. Building on multi-turn conversational UAV missions from $\alpha^{3}$-Bench, the framework augments benign episodes with 20,000 validated security overlay attack scenarios targeting seven autonomy layers, including sensing, perception, planning, control, communication, edge/cloud infrastructure, and LLM reasoning. $\alpha^{3}$-SecBench evaluates agents across three orthogonal dimensions: security (attack detection and vulnerability attribution), resilience (safe degradation behavior), and trust (policy-compliant tool usage). We evaluate 23 state-of-the-art LLMs from major industrial providers and leading AI labs using thousands of adversarially augmented UAV episodes sampled from a corpus of 113,475 missions spanning 175 threat types. While many models reliably detect anomalous behavior, effective mitigation, vulnerability attribution, and trustworthy control actions remain inconsistent. Normalized overall scores range from 12.9% to 57.1%, highlighting a significant gap between anomaly detection and security-aware autonomous decision-making. We release $\alpha^{3}$-SecBench on GitHub: https://github.com/maferrag/AlphaSecBench
Problem

Research questions and friction points this paper is trying to address.

LLM-based UAV agents
security
resilience
trust
6G networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

security evaluation
LLM-based UAV agents
adversarial resilience
6G networks
trustworthy autonomy
M
M. Ferrag
Department of Computer and Network Engineering, College of Information Technology, United Arab Emirates University, Al Ain, United Arab Emirates
Abderrahmane Lakas
Abderrahmane Lakas
Professor, Computer Engineering, UAE University
Mobile NetworksVehicular NetworksIoTUnmanned VehiclesAI
M
M. Debbah
Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates