FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the task of named entity recognition and classification of toxic habits—specifically tobacco, alcohol, cannabis, and drugs—in Spanish clinical texts by participating in Subtask 1 of the ToxHabits shared task. The work proposes zero-shot and few-shot prompting strategies leveraging a large language model (GPT-4.1), enhanced through prompt engineering techniques. Evaluated in a low-resource, non-English clinical setting, the approach demonstrates the effectiveness and advantages of few-shot learning for this domain. On the official test set, the system achieves an F1 score of 0.65, indicating strong performance in identifying and classifying toxic habit entities within Spanish clinical narratives. These results suggest that prompt-based methods with large language models offer a viable pathway for multilingual clinical information extraction, particularly in resource-constrained languages.
📝 Abstract
The paper presents an approach for the recognition of toxic habits named entities in Spanish clinical texts. The approach was developed for the ToxHabits Shared Task. Our team participated in subtask 1, which aims to detect substance use and abuse mentions in clinical case reports and classify them in four categories (Tobacco, Alcohol, Cannabis, and Drug). We explored various methods of utilizing LLMs for the task, including zero-shot, few-shot, and prompt optimization, and found that GPT-4.1's few-shot prompting performed the best in our experiments. Our method achieved an F1 score of 0.65 on the test set, demonstrating a promising result for recognizing named entities in languages other than English.
Problem

Research questions and friction points this paper is trying to address.

toxic habits
named entity recognition
Spanish clinical texts
substance use
LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
few-shot prompting
toxic habit extraction
Spanish clinical text
named entity recognition
🔎 Similar Papers
No similar papers found.
S
Sylvia Vassileva
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
I
Ivan Koychev
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria
Svetla Boytcheva
Svetla Boytcheva
Ontotext
Artificial IntelligenceComputational LinguisticsMedical InformaticsMachine Learning