AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of multimodal integration—specifically, the difficulty in synergizing speech, visual, and ADS-B data—and the lack of human-in-the-loop (HITL) evaluation in aviation conflict detection, this work introduces the first open-source, modular, and lightweight HITL simulation platform. Built on the Godot engine, it integrates fine-tuned Whisper ASR, YOLOv8-based visual detection, ADS-B message parsing, and GPT-OSS-20B for structured reasoning, all interconnected via standardized JSON APIs to enable plug-and-play model integration and reproducible multi-scenario testing. The platform supports representative conflict scenarios—including runway incursions and en-route conflicts—in terminal maneuvering areas and airways. Empirical evaluation yields an average time-to-first-alert of 7.7 seconds, with ASR and vision processing latencies of ~5.9 s and 0.4 s, respectively, validating effective multimodal synergy. Key contributions include: (1) a unified benchmarking framework; (2) a reproducible scenario suite; and (3) end-to-end HITL evaluation capability—collectively enhancing development efficiency and trustworthiness of flight safety assistance systems.

Technology Category

Application Category

📝 Abstract

We introduce AIRHILT (Aviation Integrated Reasoning, Human-in-the-Loop Testbed), a modular and lightweight simulation environment designed to evaluate multimodal pilot and air traffic control (ATC) assistance systems for aviation conflict detection. Built on the open-source Godot engine, AIRHILT synchronizes pilot and ATC radio communications, visual scene understanding from camera streams, and ADS-B surveillance data within a unified, scalable platform. The environment supports pilot- and controller-in-the-loop interactions, providing a comprehensive scenario suite covering both terminal area and en route operational conflicts, including communication errors and procedural mistakes. AIRHILT offers standardized JSON-based interfaces that enable researchers to easily integrate, swap, and evaluate automatic speech recognition (ASR), visual detection, decision-making, and text-to-speech (TTS) models. We demonstrate AIRHILT through a reference pipeline incorporating fine-tuned Whisper ASR, YOLO-based visual detection, ADS-B-based conflict logic, and GPT-OSS-20B structured reasoning, and present preliminary results from representative runway-overlap scenarios, where the assistant achieves an average time-to-first-warning of approximately 7.7 s, with average ASR and vision latencies of approximately 5.9 s and 0.4 s, respectively. The AIRHILT environment and scenario suite are openly available, supporting reproducible research on multimodal situational awareness and conflict detection in aviation; code and scenarios are available at https://github.com/ogarib3/airhilt.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal pilot and ATC assistance systems for aviation conflict detection

Synchronizing radio communications, visual scene understanding, and ADS-B surveillance data

Supporting pilot- and controller-in-the-loop interactions for operational conflict scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular simulation environment for aviation conflict detection

Integrates multimodal data using open-source Godot engine

Standardized JSON interfaces for flexible model integration

🔎 Similar Papers

No similar papers found.

Authors to Follow