Overview of the ClinicalSkillQA 2026 Shared Task on Continuous Perception and Procedural Reasoning in Clinical Skill Assessment

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work proposes a novel evaluation task designed to assess AI systems’ ability to integrate continuous visual perception, temporal structure reconstruction, and clinical workflow knowledge in the context of clinical skill assessment. Specifically, the system must reorder shuffled clinical keyframes into their correct temporal sequence and generate expert-verifiable reasoning explanations. To support this, the authors introduce a benchmark dataset comprising 200 test instances across three emergency medical procedures and employ multidimensional metrics—including task accuracy, pairwise accuracy, and BERTScore—for comprehensive evaluation. Analysis of 90 submissions from seven teams reveals that current models still face significant challenges in jointly leveraging visual evidence, temporal logic, and domain-specific knowledge. This study formalizes this reasoning task for the first time, establishing a new benchmark for multimodal understanding in clinical settings.
📝 Abstract
This paper presents an overview of the ClinicalSkillQA 2026 shared task, which was organized with the BioNLP Workshop at ACL 2026. The goal of this shared task is to evaluate continuous perception and procedural reasoning in clinical skill assessment by requiring systems to reconstruct the correct temporal order of shuffled clinical key frames and generate rationales grounded in clinical workflow knowledge. The benchmark contains 200 test-only instances sampled from clinical skill videos, covering three emergency-care procedures. Each instance is annotated with the ground-truth temporal order and an expert-verified rationale. A total of seven teams participated in the task, collectively making 90 submissions, with four teams providing system description papers. Systems are evaluated using Task Accuracy, Pairwise Accuracy, and BERTScore, which measure exact sequence reconstruction, local temporal consistency, and rationale quality, respectively. In this paper, we describe the task setup, dataset construction, and evaluation criteria. We further summarize the methodologies adopted by participating teams and present a comprehensive analysis of the submitted systems. The official results suggest that current models still struggle with continuous perception and procedural reasoning, especially when they must integrate visual evidence, temporal structure, and clinical workflow knowledge.
Problem

Research questions and friction points this paper is trying to address.

continuous perception
procedural reasoning
clinical skill assessment
temporal ordering
clinical workflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

continuous perception
procedural reasoning
clinical skill assessment
temporal ordering
rationale generation
🔎 Similar Papers
No similar papers found.
X
Xiyang Huang
School of Artificial Intelligence, Wuhan University; Center for Language and Information Research, Wuhan University
R
Renxiong Wei
Zhongnan Hospital of Wuhan University
Y
Yihuai Xu
School of Artificial Intelligence, Wuhan University; Center for Language and Information Research, Wuhan University
Zhiyuan Chen
Zhiyuan Chen
School of Economics and Management, Wuhan University
Operations Management
K
Keying Wu
School of Artificial Intelligence, Wuhan University; Center for Language and Information Research, Wuhan University
J
Jiayi Xiang
School of Artificial Intelligence, Wuhan University; Center for Language and Information Research, Wuhan University
B
Buzhou Tang
Harbin Institute of Technology, Shenzhen
Y
Yanqing Ye
Zhongnan Hospital of Wuhan University
Jinyu Chen
Jinyu Chen
The Hong Kong Polytechnic University
Edge/cloud computingVideo transmission.
C
Cheng Zeng
School of Artificial Intelligence, Wuhan University
M
Min Peng
School of Artificial Intelligence, Wuhan University; Center for Language and Information Research, Wuhan University
Qianqian Xie
Qianqian Xie
Wuhan University
NLPLLM
Sophia Ananiadou
Sophia Ananiadou
Professor, Computer Science, Manchester University, National Centre for Text Mining
Natural Language ProcessingText MiningComputational LinguisticsArtificial Intelligence