Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This paper addresses the fundamental challenge in unmanned aerial vehicle (UAV) visual servoing (VS)—namely, the inability to navigate when the target is initially invisible. To overcome this limitation, we propose a novel VS framework grounded in latent-space diffusion modeling. Our core innovation is the first integration of a return-conditioned latent-space denoising diffusion probabilistic model (DDPM) into VS, coupled with a cross-modal variational autoencoder to disentangle visual and motor representations. By explicitly modeling and generating optimal control trajectories conditioned on partial or absent visual observations, our method enables reliable navigation even under non-line-of-sight (NLOS) conditions. Unlike conventional VS approaches, it does not require continuous target visibility. Extensive simulations demonstrate stable, high-convergence-rate navigation across both quadrotor and hexarotor platforms, with strong robustness to occlusion and initialization uncertainty. This work establishes a new paradigm for NLOS visual servoing.

Technology Category

Application Category

📝 Abstract

In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.

Problem

Research questions and friction points this paper is trying to address.

Visual servoing with invisible target

Latent diffusion for UAV navigation

Cross-modal learning for trajectory generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent DDPMs for visual servoing

Cross-Modal Variational Autoencoder

Target-invisible initial views navigation

🔎 Similar Papers

GazeFusion: Saliency-guided Image Generation

2024-03-16ACM Transactions on Applied PerceptionCitations: 0

DiffArtist: Towards Structure and Appearance Controllable Image Stylization

2024-07-22Citations: 2

Authors to Follow