🤖 AI Summary
Existing clinical question-answering (CQA) systems struggle to perform fine-grained medical semantic classification of answers extracted from electronic medical records (EMRs), hindering structured retrieval and interpretable decision support. To address this, we propose the first joint multi-task learning framework for CQA that simultaneously models answer span extraction and fine-grained classification into five standardized clinical categories: diagnosis, medication, symptom, procedure, and laboratory test. Our architecture employs ClinicalBERT as a shared encoder and features a dual-head decoder—one for span prediction and another for category classification—trained end-to-end on the emrQA dataset. Experimental results demonstrate a 2.2-point improvement in F1-score over prior methods and achieve 90.7% accuracy in answer categorization. This significantly enhances the structured representation of EMR-derived clinical information and strengthens the foundation for explainable, category-aware clinical decision support.
📝 Abstract
Clinical Question Answering (CQA) plays a crucial role in medical decision-making, enabling physicians to extract relevant information from Electronic Medical Records (EMRs). While transformer-based models such as BERT, BioBERT, and ClinicalBERT have demonstrated state-of-the-art performance in CQA, existing models lack the ability to categorize extracted answers, which is critical for structured retrieval, content filtering, and medical decision support. To address this limitation, we introduce a Multi-Task Learning (MTL) framework that jointly trains CQA models for both answer extraction and medical categorization. In addition to predicting answer spans, our model classifies responses into five standardized medical categories: Diagnosis, Medication, Symptoms, Procedure, and Lab Reports. This categorization enables more structured and interpretable outputs, making clinical QA models more useful in real-world healthcare settings. We evaluate our approach on emrQA, a large-scale dataset for medical question answering. Results show that MTL improves F1-score by 2.2% compared to standard fine-tuning, while achieving 90.7% accuracy in answer categorization. These findings suggest that MTL not only enhances CQA performance but also introduces an effective mechanism for categorization and structured medical information retrieval.