🤖 AI Summary
This study addresses the task of named entity recognition and classification of toxic habits—specifically tobacco, alcohol, cannabis, and drugs—in Spanish clinical texts by participating in Subtask 1 of the ToxHabits shared task. The work proposes zero-shot and few-shot prompting strategies leveraging a large language model (GPT-4.1), enhanced through prompt engineering techniques. Evaluated in a low-resource, non-English clinical setting, the approach demonstrates the effectiveness and advantages of few-shot learning for this domain. On the official test set, the system achieves an F1 score of 0.65, indicating strong performance in identifying and classifying toxic habit entities within Spanish clinical narratives. These results suggest that prompt-based methods with large language models offer a viable pathway for multilingual clinical information extraction, particularly in resource-constrained languages.
📝 Abstract
The paper presents an approach for the recognition of toxic habits named entities in Spanish clinical texts. The approach was developed for the ToxHabits Shared Task. Our team participated in subtask 1, which aims to detect substance use and abuse mentions in clinical case reports and classify them in four categories (Tobacco, Alcohol, Cannabis, and Drug). We explored various methods of utilizing LLMs for the task, including zero-shot, few-shot, and prompt optimization, and found that GPT-4.1's few-shot prompting performed the best in our experiments. Our method achieved an F1 score of 0.65 on the test set, demonstrating a promising result for recognizing named entities in languages other than English.