NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar

📅 2024-08-30
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Deploying high-complexity multi-sensor visual localization models on unmanned surface vehicles (USVs) in complex waterway environments is hindered by stringent constraints on computational load, power consumption, and environmental robustness. Method: This paper proposes a lightweight, low-power, natural-language-driven multimodal visual localization framework that jointly leverages visible-light imagery and 4D millimeter-wave radar—marking the first prompt-guided approach for such fusion. It supports both bounding-box-level and mask-level localization outputs. The method employs a lightweight multi-task Transformer architecture, incorporating a cross-modal prompt alignment mechanism, a decoupled fusion strategy for radar point clouds and image features, and a waterway-scene-adaptive training paradigm. Contribution/Results: Evaluated on the WaterVG dataset, the method achieves state-of-the-art accuracy, demonstrating exceptional robustness under adverse conditions (e.g., rain, fog, low illumination). Its inference power consumption remains below 1.2 W, significantly enhancing feasibility for long-endurance USV deployment.

Technology Category

Application Category

📝 Abstract
Recently, visual grounding and multi-sensors setting have been incorporated into perception system for terrestrial autonomous driving systems and Unmanned Surface Vehicles (USVs), yet the high complexity of modern learning-based visual grounding model using multi-sensors prevents such model to be deployed on USVs in the real-life. To this end, we design a low-power multi-task model named NanoMVG for waterway embodied perception, guiding both camera and 4D millimeter-wave radar to locate specific object(s) through natural language. NanoMVG can perform both box-level and mask-level visual grounding tasks simultaneously. Compared to other visual grounding models, NanoMVG achieves highly competitive performance on the WaterVG dataset, particularly in harsh environments and boasts ultra-low power consumption for long endurance.
Problem

Research questions and friction points this paper is trying to address.

Low-power multi-task visual grounding
Integration of camera and 4D mmWave radar
Efficient object localization in harsh environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-power multi-task model
Prompt-guided camera and radar
Box-level and mask-level grounding
🔎 Similar Papers
No similar papers found.