🤖 AI Summary
To address the challenge of safe, autonomous UAV landing in complex and unknown environments, this paper proposes a monocular 3D perception framework that formulates Safe Landing Zone (SLZ) estimation as an end-to-end binary semantic segmentation task. Methodologically, it introduces a novel joint regression mechanism for depth and surface normal vectors, integrates a fine-tuned Metric3D V2 backbone, and designs a dedicated SLZ segmentation head. We further construct WildUAV, a new dataset featuring drone-view annotations and cross-domain evaluation subsets. Key contributions include: (1) the first monocular-image-based framework enabling joint quantitative estimation of SLZ location and area; (2) zero-shot cross-domain generalization capability, significantly improving robustness and segmentation accuracy; and (3) real-time performance and operational reliability validated on an actual UAV decision-making system.
📝 Abstract
This paper presents VisLanding, a monocular 3D perception-based framework for safe UAV (Unmanned Aerial Vehicle) landing. Addressing the core challenge of autonomous UAV landing in complex and unknown environments, this study innovatively leverages the depth-normal synergy prediction capabilities of the Metric3D V2 model to construct an end-to-end safe landing zones (SLZ) estimation framework. By introducing a safe zone segmentation branch, we transform the landing zone estimation task into a binary semantic segmentation problem. The model is fine-tuned and annotated using the WildUAV dataset from a UAV perspective, while a cross-domain evaluation dataset is constructed to validate the model's robustness. Experimental results demonstrate that VisLanding significantly enhances the accuracy of safe zone identification through a depth-normal joint optimization mechanism, while retaining the zero-shot generalization advantages of Metric3D V2. The proposed method exhibits superior generalization and robustness in cross-domain testing compared to other approaches. Furthermore, it enables the estimation of landing zone area by integrating predicted depth and normal information, providing critical decision-making support for practical applications.