🤖 AI Summary
This study addresses the classification of kinetic-energy-related injuries in hospital triage texts, tackling three key challenges: data privacy sensitivity, high annotation costs, and constrained edge-computing resources. We propose a lightweight two-stage fine-tuning paradigm: first, preliminary fine-tuning of a pre-trained large language model on 2K open-source samples using GPU; second, efficient secondary adaptation on one thousand anonymized hospital records executed entirely on CPU. This approach eliminates reliance on high-end hardware or extensive expert annotations, substantially lowering deployment barriers. Experimental results demonstrate that the model maintains high classification accuracy under low-resource conditions while ensuring patient data privacy, computational efficiency, and clinical applicability. The method provides a practical, deployable solution for intelligent triage in resource-limited primary healthcare settings.
📝 Abstract
Triage notes, created at the start of a patient's hospital visit, contain a wealth of information that can help medical staff and researchers understand Emergency Department patient epidemiology and the degree of time-dependent illness or injury. Unfortunately, applying modern Natural Language Processing and Machine Learning techniques to analyse triage data faces some challenges: Firstly, hospital data contains highly sensitive information that is subject to privacy regulation thus need to be analysed on site; Secondly, most hospitals and medical facilities lack the necessary hardware to fine-tune a Large Language Model (LLM), much less training one from scratch; Lastly, to identify the records of interest, expert inputs are needed to manually label the datasets, which can be time-consuming and costly. We present in this paper a pipeline that enables the classification of triage data using LLM and limited compute resources. We first fine-tuned a pre-trained LLM with a classifier using a small (2k) open sourced dataset on a GPU; and then further fine-tuned the model with a hospital specific dataset of 1000 samples on a CPU. We demonstrated that by carefully curating the datasets and leveraging existing models and open sourced data, we can successfully classify triage data with limited compute resources.