Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Monocular visual pose estimation for unmanned aerial vehicles (UAVs) in maritime environments faces challenges in real-world validation due to reliance on expensive research vessels and degraded GPS performance. Method: This paper introduces the first high-fidelity, vision-control closed-loop simulation environment specifically designed for shipboard UAV autonomous visual landing. It innovatively integrates Gaussian Splatting—originally a neural radiance field acceleration technique—into dynamic maritime scene modeling, enabling end-to-end generation of lightweight, high-quality 3D reconstructions from multi-view real-world imagery for training and closed-loop evaluation of depth-aware pose estimation networks. A Transformer-based monocular visual pose estimation algorithm is co-validated with flight control hardware and software in the simulated environment. Contribution/Results: The framework significantly reduces dependence on costly at-sea trials, accelerates development cycles, and enhances the reliability and robustness of vision-based autonomous landing algorithms under complex maritime conditions.

Technology Category

Application Category

📝 Abstract

This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. Recently, a deep neural network with a transformer architecture has been successfully trained to estimate the pose of a UAV relative to the flight deck of a research vessel, overcoming several limitations of GPS-based approaches. However, validating the deep pose estimation scheme in an actual ocean environment poses significant challenges due to the limited availability of research vessels and the associated operational costs. To address these issues, we present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting, a novel technique that represents 3D scenes by modeling image pixels as Gaussian distributions in 3D space, creating a lightweight and high-quality visual model from multiple viewpoints. This approach enables the creation of a virtual environment integrating multiple real-world images collected in situ. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme. This approach provides a cost-effective solution for testing and validating the autonomous flight of shipboard UAVs, specifically focusing on vision-based control and estimation algorithms.

Problem

Research questions and friction points this paper is trying to address.

Monocular pose estimation of UAV

Vision-in-the-loop simulation environment

Testing in ocean-like virtual environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based deep neural network

Gaussian splatting for 3D scenes

Photo-realistic virtual environment simulation

🔎 Similar Papers

Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

2024-06-13arXiv.orgCitations: 1

Authors to Follow