Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

๐Ÿ“… 2024-09-20
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the fundamental challenge in unmanned aerial vehicle (UAV) visual servoing (VS)โ€”namely, the inability to navigate when the target is initially invisible. To overcome this limitation, we propose a novel VS framework grounded in latent-space diffusion modeling. Our core innovation is the first integration of a return-conditioned latent-space denoising diffusion probabilistic model (DDPM) into VS, coupled with a cross-modal variational autoencoder to disentangle visual and motor representations. By explicitly modeling and generating optimal control trajectories conditioned on partial or absent visual observations, our method enables reliable navigation even under non-line-of-sight (NLOS) conditions. Unlike conventional VS approaches, it does not require continuous target visibility. Extensive simulations demonstrate stable, high-convergence-rate navigation across both quadrotor and hexarotor platforms, with strong robustness to occlusion and initialization uncertainty. This work establishes a new paradigm for NLOS visual servoing.

Technology Category

Application Category

๐Ÿ“ Abstract
In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.
Problem

Research questions and friction points this paper is trying to address.

Visual servoing with invisible target
Latent diffusion for UAV navigation
Cross-modal learning for trajectory generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent DDPMs for visual servoing
Cross-Modal Variational Autoencoder
Target-invisible initial views navigation
๐Ÿ”Ž Similar Papers
2024-03-16ACM Transactions on Applied PerceptionCitations: 0