NOWJ @BioCreative IX ToxHabits: An Ensemble Deep Learning Approach for Detecting Substance Use and Contextual Information in Clinical Texts

๐Ÿ“… 2026-02-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

176K/year
๐Ÿค– AI Summary
This study addresses the challenge of identifying substance use in low-resource, unstructured Spanish electronic health records by proposing a multi-task ensemble deep learning framework that jointly models toxicant named entity recognition (ToxNER) and usage context detection (ToxUse). The approach integrates the BETO pretrained language model with conditional random field (CRF) decoding, diverse training strategies, and a sentence-level filtering mechanism to enhance model accuracy and robustness under scarce labeled data. Experimental results demonstrate that the system achieves an F1 score of 0.94 and precision of 0.97 for trigger detection, and an F1 score of 0.91 for argument detection, significantly outperforming baseline methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Extracting drug use information from unstructured Electronic Health Records remains a major challenge in clinical Natural Language Processing. While Large Language Models demonstrate advancements, their use in clinical NLP is limited by concerns over trust, control, and efficiency. To address this, we present NOWJ submission to the ToxHabits Shared Task at BioCreative IX. This task targets the detection of toxic substance use and contextual attributes in Spanish clinical texts, a domain-specific, low-resource setting. We propose a multi-output ensemble system tackling both Subtask 1 - ToxNER and Subtask 2 - ToxUse. Our system integrates BETO with a CRF layer for sequence labeling, employs diverse training strategies, and uses sentence filtering to boost precision. Our top run achieved 0.94 F1 and 0.97 precision for Trigger Detection, and 0.91 F1 for Argument Detection.
Problem

Research questions and friction points this paper is trying to address.

substance use detection
clinical text
contextual information extraction
low-resource setting
Electronic Health Records
Innovation

Methods, ideas, or system contributions that make the work stand out.

ensemble deep learning
BETO-CRF
sentence filtering
low-resource clinical NLP
multi-output system
๐Ÿ”Ž Similar Papers
No similar papers found.
๐Ÿ’ผ Related Jobs
Postdoctoral Fellow โ€“ AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizerโ€™s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of lifeโ€™s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site โ€“ U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
H
Huu-Huy-Hoang Tran
University of Engineering and Technology, Vietnam National University
G
Gia-Bao Duong
University of Engineering and Technology, Vietnam National University
Q
Quoc-Viet-Anh Tran
University of Engineering and Technology, Vietnam National University
Thi-Hai-Yen Vuong
Thi-Hai-Yen Vuong
VNU University of Engineering and Technology, Vietnam National University, Hanoi
Data minningNLPLegal NLPSymbolic AI
H
Hoang-Quynh Le
University of Engineering and Technology, Vietnam National University