AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes an open-source, modular large language model (LLM)-driven framework that enables unmanned aerial vehicles (UAVs) to function as autonomous aerial agents capable of understanding and executing natural language instructions—a capability lacking in conventional UAV systems reliant on predefined commands. The framework adopts a “brain–skills–runtime” architecture, integrating hard-coded skills with Markdown-based soft skills, document-driven state management, memory-augmented reflection mechanisms, and safety-aware runtime verification. It supports multiple simulation environments, including PX4 Software-in-the-Loop (SITL), Gazebo, and AirSim. Experimental results demonstrate that the system efficiently performs task comprehension, decision-making, execution, and adaptive feedback, substantially enhancing the UAV’s versatility, reproducibility, and scalability.

📝 Abstract

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

Problem

Research questions and friction points this paper is trying to address.

autonomous aerial agents

unmanned aerial vehicles

LLM-driven decision-making

modular UAV framework

natural-language mission interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven aerial agents

modular brain-skill-runtime architecture

document-driven agent state