π€ AI Summary
This study addresses the challenge of identifying substance use in low-resource, unstructured Spanish electronic health records by proposing a multi-task ensemble deep learning framework that jointly models toxicant named entity recognition (ToxNER) and usage context detection (ToxUse). The approach integrates the BETO pretrained language model with conditional random field (CRF) decoding, diverse training strategies, and a sentence-level filtering mechanism to enhance model accuracy and robustness under scarce labeled data. Experimental results demonstrate that the system achieves an F1 score of 0.94 and precision of 0.97 for trigger detection, and an F1 score of 0.91 for argument detection, significantly outperforming baseline methods.
π Abstract
Extracting drug use information from unstructured Electronic Health Records remains a major challenge in clinical Natural Language Processing. While Large Language Models demonstrate advancements, their use in clinical NLP is limited by concerns over trust, control, and efficiency. To address this, we present NOWJ submission to the ToxHabits Shared Task at BioCreative IX. This task targets the detection of toxic substance use and contextual attributes in Spanish clinical texts, a domain-specific, low-resource setting. We propose a multi-output ensemble system tackling both Subtask 1 - ToxNER and Subtask 2 - ToxUse. Our system integrates BETO with a CRF layer for sequence labeling, employs diverse training strategies, and uses sentence filtering to boost precision. Our top run achieved 0.94 F1 and 0.97 precision for Trigger Detection, and 0.91 F1 for Argument Detection.