LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

๐Ÿ“… 2026-01-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes the first autonomous inspection framework that integrates large language models (LLMs) and vision-language models (VLMs) for heterogeneous droneโ€“USV systems in port environments. Addressing the limitations of manual inspections and context-agnostic visual methods, the system interprets natural language instructions to generate symbolic task plans and performs real-time semantic compliance assessment with structured reporting. By replacing conventional finite-state machines with LLM-driven symbolic planning and VLM-based semantic understanding, the approach enables context-aware, adaptive, and resource-efficient collaborative inspection. Extensive validation on an extended MBZIRC maritime simulator and real robotic platforms demonstrates the systemโ€™s capability to safely and efficiently conduct real-time semantic inspections in complex port settings, highlighting its readiness for practical deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing inspection methods often rely on manual operations and conventional computer vision techniques that lack scalability and contextual understanding. This study introduces a novel integrated engineering framework that utilizes the synergy between Large Language Models (LLMs) and Vision Language Models (VLMs) to enable autonomous maritime port inspection using cooperative aerial and surface robotic platforms. The proposed framework replaces traditional state-machine mission planners with LLM-driven symbolic planning and improved perception pipelines through VLM-based semantic inspection, enabling context-aware and adaptive monitoring. The LLM module translates natural language mission instructions into executable symbolic plans with dependency graphs that encode operational constraints and ensure safe UAV-USV coordination. Meanwhile, the VLM module performs real-time semantic inspection and compliance assessment, generating structured reports with contextual reasoning. The framework was validated using the extended MBZIRC Maritime Simulator with realistic port infrastructure and further assessed through real-world robotic inspection trials. The lightweight on-board design ensures suitability for resource-constrained maritime platforms, advancing the development of intelligent, autonomous inspection systems. Project resources (code and videos) can be found here: https://github.com/Muhayyuddin/llm-vlm-fusion-port-inspection
Problem

Research questions and friction points this paper is trying to address.

maritime port inspection
autonomous inspection
contextual understanding
scalability
heterogeneous robotic systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-VLM fusion
autonomous maritime inspection
heterogeneous UAV-USV system
symbolic planning
semantic compliance assessment
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Muhayy ud Din
Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
Waseem Akram
Waseem Akram
Khalifa University UAE
Marine roboticsAutonomous systemsComputer vision
A
A. B. Bakht
Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates
Irfan Hussain
Irfan Hussain
Assistant Professor Khalifa University.
GraspingMechatronicsRehabilitationProsthesis