๐ค AI Summary
To address the challenge of real-time obstacle avoidance and safety assurance for unmanned aerial vehicles (UAVs) executing vision-language navigation (VLN) under natural language instructions, this paper proposes a scene-aware adaptive safety boundary algorithm. The method introduces a novel depth-map-driven control barrier function (CBF), tightly integrating RGB-D sensing, CLIP-based language understanding, and YOLO-based object detection to enable dynamic identification of moving obstacles and online adaptation of safety marginsโovercoming the limitations of conventional static safety constraints. The approach is optimized in ROS/Gazebo simulation and deployed on a Parrot Bebop2 UAV. Experimental results demonstrate that, compared to a CBF-free baseline, the proposed method improves task success rate by 59.4%โ61.8%, with only a marginal increase in trajectory length (5.4%โ8.2%), while ensuring real-time recovery from hazardous states.
๐ Abstract
In the rapidly evolving field of vision-language navigation (VLN), ensuring robust safety mechanisms remains an open challenge. Control barrier functions (CBFs) are efficient tools which guarantee safety by solving an optimal control problem. In this work, we consider the case of a teleoperated drone in a VLN setting, and add safety features by formulating a novel scene-aware CBF using ego-centric observations obtained through an RGB-D sensor. As a baseline, we implement a vision-language understanding module which uses the contrastive language image pretraining (CLIP) model to query about a user-specified (in natural language) landmark. Using the YOLO (You Only Look Once) object detector, the CLIP model is queried for verifying the cropped landmark, triggering downstream navigation. To improve navigation safety of the baseline, we propose ASMA -- an Adaptive Safety Margin Algorithm -- that crops the drone's depth map for tracking moving object(s) to perform scene-aware CBF evaluation on-the-fly. By identifying potential risky observations from the scene, ASMA enables real-time adaptation to unpredictable environmental conditions, ensuring optimal safety bounds on a VLN-powered drone actions. Using the robot operating system (ROS) middleware on a parrot bebop2 quadrotor in the gazebo environment, ASMA offers 59.4% - 61.8% increase in success rates with insignificant 5.4% - 8.2% increases in trajectory lengths compared to the baseline CBF-less VLN while recovering from unsafe situations.