RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the open-set semantic navigation challenge for UAVs in large-scale, unstructured outdoor environments with sparse, long-range targets. Methodologically, we propose a 3D semantic navigation framework featuring a spatially consistent semantic voxel-ray map as persistent memory—integrating short-range voxel search and long-range ray search—and augmenting spatial reasoning with vision-language models (VLMs) to provide cross-modal semantic cues. An adaptive, behavior-tree-driven decision mechanism coordinates reactive responses with global planning. Key contributions include: (i) the first online-updatable 3D semantic memory supporting large-scale open-set navigation; (ii) VLM-enhanced cross-modal spatial reasoning; and (iii) a real-time adaptive behavior-switching strategy. Evaluated across 10 simulated environments and 100 navigation tasks, our method outperforms baselines by 85.25% in success rate and has been successfully deployed and validated in real-world outdoor scenarios.

Technology Category

Application Category

📝 Abstract
Aerial outdoor semantic navigation requires robots to explore large, unstructured environments to locate target objects. Recent advances in semantic navigation have demonstrated open-set object-goal navigation in indoor settings, but these methods remain limited by constrained spatial ranges and structured layouts, making them unsuitable for long-range outdoor search. While outdoor semantic navigation approaches exist, they either rely on reactive policies based on current observations, which tend to produce short-sighted behaviors, or precompute scene graphs offline for navigation, limiting adaptability to online deployment. We present RAVEN, a 3D memory-based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments. It (1) uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors, (2) combines short-range voxel search and long-range ray search to scale to large environments, (3) leverages a large vision-language model to suggest auxiliary cues, mitigating sparsity of outdoor targets. These components are coordinated by a behavior tree, which adaptively switches behaviors for robust operation. We evaluate RAVEN in 10 photorealistic outdoor simulation environments over 100 semantic tasks, encompassing single-object search, multi-class, multi-instance navigation and sequential task changes. Results show RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests.
Problem

Research questions and friction points this paper is trying to address.

Enables aerial robots to locate target objects in large unstructured outdoor environments
Overcomes limitations of short-sighted reactive policies and offline precomputation methods
Addresses semantic target sparsity through vision-language model auxiliary cue suggestions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic voxel-ray map for long-horizon planning
Combines short-range voxel and long-range ray search
Leverages vision-language model for auxiliary cues
🔎 Similar Papers
No similar papers found.