Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In safety-critical domains such as maritime operations, manual assessment of procedural communication compliance suffers from low efficiency and poor reproducibility. To address this, we propose Prompt-and-Check: a zero-shot, context-augmented prompting framework that leverages open-source large language models (LLaMA 2/3, Mistral) on local GPU hardware (RTX 4070) to perform fine-grained compliance classification directly from dialogue transcripts—without model fine-tuning. The method enables context-aware reasoning and fully offline deployment. Experimental evaluation demonstrates strong agreement between model predictions and domain expert annotations (Cohen’s κ > 0.85), significantly enhancing automation and objectivity in post-training debriefing of simulation-based training. Prompt-and-Check establishes a lightweight, interpretable, and deployable paradigm for compliance assessment in high-reliability human–AI collaborative settings.

Technology Category

Application Category

📝 Abstract
Accurate evaluation of procedural communication compliance is essential in simulation-based training, particularly in safety-critical domains where adherence to compliance checklists reflects operational competence. This paper explores a lightweight, deployable approach using prompt-based inference with open-source large language models (LLMs) that can run efficiently on consumer-grade GPUs. We present Prompt-and-Check, a method that uses context-rich prompts to evaluate whether each checklist item in a protocol has been fulfilled, solely based on transcribed verbal exchanges. We perform a case study in the maritime domain with participants performing an identical simulation task, and experiment with models such as LLama 2 7B, LLaMA 3 8B and Mistral 7B, running locally on an RTX 4070 GPU. For each checklist item, a prompt incorporating relevant transcript excerpts is fed into the model, which outputs a compliance judgment. We assess model outputs against expert-annotated ground truth using classification accuracy and agreement scores. Our findings demonstrate that prompting enables effective context-aware reasoning without task-specific training. This study highlights the practical utility of LLMs in augmenting debriefing, performance feedback, and automated assessment in training environments.
Problem

Research questions and friction points this paper is trying to address.

Evaluating communication compliance in simulation-based training
Using LLMs to assess protocol checklist fulfillment from transcripts
Providing automated performance feedback without task-specific training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs for protocol compliance evaluation
Prompt-based inference with open-source language models
Context-rich prompts analyze verbal exchanges for compliance
🔎 Similar Papers
No similar papers found.
Vishakha Lall
Vishakha Lall
Lead Research Engineer
Speech RecognitionNatural Language ProcessingComputer Vision
Y
Yisi Liu
Centre of Excellence in Maritime Safety, Singapore Polytechnic, Singapore