Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In safety-critical domains such as maritime operations, manual assessment of procedural communication compliance suffers from low efficiency and poor reproducibility. To address this, we propose Prompt-and-Check: a zero-shot, context-augmented prompting framework that leverages open-source large language models (LLaMA 2/3, Mistral) on local GPU hardware (RTX 4070) to perform fine-grained compliance classification directly from dialogue transcripts—without model fine-tuning. The method enables context-aware reasoning and fully offline deployment. Experimental evaluation demonstrates strong agreement between model predictions and domain expert annotations (Cohen’s κ > 0.85), significantly enhancing automation and objectivity in post-training debriefing of simulation-based training. Prompt-and-Check establishes a lightweight, interpretable, and deployable paradigm for compliance assessment in high-reliability human–AI collaborative settings.

Technology Category

Application Category

📝 Abstract

Accurate evaluation of procedural communication compliance is essential in simulation-based training, particularly in safety-critical domains where adherence to compliance checklists reflects operational competence. This paper explores a lightweight, deployable approach using prompt-based inference with open-source large language models (LLMs) that can run efficiently on consumer-grade GPUs. We present Prompt-and-Check, a method that uses context-rich prompts to evaluate whether each checklist item in a protocol has been fulfilled, solely based on transcribed verbal exchanges. We perform a case study in the maritime domain with participants performing an identical simulation task, and experiment with models such as LLama 2 7B, LLaMA 3 8B and Mistral 7B, running locally on an RTX 4070 GPU. For each checklist item, a prompt incorporating relevant transcript excerpts is fed into the model, which outputs a compliance judgment. We assess model outputs against expert-annotated ground truth using classification accuracy and agreement scores. Our findings demonstrate that prompting enables effective context-aware reasoning without task-specific training. This study highlights the practical utility of LLMs in augmenting debriefing, performance feedback, and automated assessment in training environments.

Problem

Research questions and friction points this paper is trying to address.

Evaluating communication compliance in simulation-based training

Using LLMs to assess protocol checklist fulfillment from transcripts

Providing automated performance feedback without task-specific training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs for protocol compliance evaluation

Prompt-based inference with open-source language models

Context-rich prompts analyze verbal exchanges for compliance

🔎 Similar Papers

No similar papers found.

Authors to Follow