DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing dermatological datasets, which are predominantly confined to image classification and thus insufficient for evaluating the integrated capabilities of vision-language models in clinical reasoning and fine-grained semantic alignment. To bridge this gap, the authors introduce DermaBench, the first visual question answering (VQA) benchmark tailored for dermatological clinical reasoning. Built upon the DDI dataset, DermaBench encompasses 656 images from 570 patients across all Fitzpatrick skin types (I–VI). Through an expert-driven hierarchical annotation framework, it yields 14,474 structured VQA samples supporting multidimensional assessment—including diagnosis, anatomical location, and morphology—alongside open-ended descriptions and summaries. Notably, it pioneers a multimodal clinical VQA annotation schema incorporating single-choice, multiple-choice, and open-ended question types. Released as metadata compatible with the original license, DermaBench is publicly available on Harvard Dataverse, offering a standardized evaluation benchmark for multimodal models.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) are increasingly important in medical applications; however, their evaluation in dermatology remains limited by datasets that focus primarily on image-level classification tasks such as lesion recognition. While valuable for recognition, such datasets cannot assess the full visual understanding, language grounding, and clinical reasoning capabilities of multimodal models. Visual question answering (VQA) benchmarks are required to evaluate how models interpret dermatological images, reason over fine-grained morphology, and generate clinically meaningful descriptions. We introduce DermaBench, a clinician-annotated dermatology VQA benchmark built on the Diverse Dermatology Images (DDI) dataset. DermaBench comprises 656 clinical images from 570 unique patients spanning Fitzpatrick skin types I-VI. Using a hierarchical annotation schema with 22 main questions (single-choice, multi-choice, and open-ended), expert dermatologists annotated each image for diagnosis, anatomic site, lesion morphology, distribution, surface features, color, and image quality, together with open-ended narrative descriptions and summaries, yielding approximately 14.474 VQA-style annotations. DermaBench is released as a metadata-only dataset to respect upstream licensing and is publicly available at Harvard Dataverse.
Problem

Research questions and friction points this paper is trying to address.

dermatology
visual question answering
vision-language models
clinical reasoning
benchmark dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Question Answering
Dermatology
Vision-Language Models
Clinical Reasoning
Benchmark Dataset
🔎 Similar Papers
No similar papers found.
Abdurrahim Yilmaz
Abdurrahim Yilmaz
Imperial College London
Deep LearningAI for DermatologyMicrorobotics
O
Ozan Erdem
Istanbul Medeniyet University, Department of Dermatology and Venereology, Turkiye
E
Ece Gokyayla
Usak Research and Training Hospital, Department of Dermatology and Venereology, Turkiye
A
Ayda Acar
Ege University, Department of Dermatology and Venereology, Turkiye
B
Burc Bugra Dagtas
Ipswich Hospital, Department of Dermatology and Venereology, United Kingdom
D
Dilara İlhan Erdil
Medicana Atakoy Hospital, Department of Dermatology and Venereology, Turkiye
G
G. Gencoglan
Imperial College London, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, United Kingdom
Burak Temelkuran
Burak Temelkuran
Imperial College London