FirstAidQA: A Synthetic Dataset for First Aid and Emergency Response in Low-Connectivity Settings

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality domain-specific datasets for emergency medical response are scarce in low-connectivity or offline settings, hindering the development of reliable AI systems for critical care. Method: We introduce FirstAidQA—the first synthetic question-answering dataset tailored to emergency medicine—comprising 5,500 high-precision, human-verified QA pairs. Leveraging ChatGPT-4o-mini with domain-specific prompt engineering and in-context learning, we generate initial samples, followed by rigorous text cleaning, chunking, safety filtering, and multi-round expert validation to ensure clinical accuracy, safety, and practical utility. Contribution/Results: FirstAidQA bridges a critical data gap for safety-critical AI in resource-constrained prehospital environments. It enables efficient instruction fine-tuning of lightweight models, significantly improving offline inference speed, reliability, and real-world deployability. The dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract
In emergency situations, every second counts. The deployment of Large Language Models (LLMs) in time-sensitive, low or zero-connectivity environments remains limited. Current models are computationally intensive and unsuitable for low-tier devices often used by first responders or civilians. A major barrier to developing lightweight, domain-specific solutions is the lack of high-quality datasets tailored to first aid and emergency response. To address this gap, we introduce FirstAidQA, a synthetic dataset containing 5,500 high-quality question answer pairs that encompass a wide range of first aid and emergency response scenarios. The dataset was generated using a Large Language Model, ChatGPT-4o-mini, with prompt-based in-context learning, using texts from the Vital First Aid Book (2019). We applied preprocessing steps such as text cleaning, contextual chunking, and filtering, followed by human validation to ensure accuracy, safety, and practical relevance of the QA pairs. FirstAidQA is designed to support instruction-tuning and fine-tuning of LLMs and Small Language Models (SLMs), enabling faster, more reliable, and offline-capable systems for emergency settings. We publicly release the dataset to advance research on safety-critical and resource-constrained AI applications in first aid and emergency response. The dataset is available on Hugging Face at https://huggingface.co/datasets/i-am-mushfiq/FirstAidQA.
Problem

Research questions and friction points this paper is trying to address.

Developing lightweight AI for first aid in low-connectivity environments
Addressing the lack of tailored datasets for emergency response scenarios
Enabling offline-capable language models for time-sensitive emergency situations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created synthetic dataset using ChatGPT-4o-mini
Applied preprocessing and human validation steps
Designed for tuning language models offline capability
🔎 Similar Papers
No similar papers found.
S
Saiyma Sittul Muna
Islamic University of Technology, Dhaka, Bangladesh
R
Rezwan Islam Salvi
Islamic University of Technology, Dhaka, Bangladesh
M
Mushfiqur Rahman Mushfique
Islamic University of Technology, Dhaka, Bangladesh
Ajwad Abrar
Ajwad Abrar
Junior Lecturer, IUT
Natural Language ProcessingHuman Computer InteractionSoftware Engineering