DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitation of existing dermatological datasets, which are predominantly confined to image classification and thus insufficient for evaluating the integrated capabilities of vision-language models in clinical reasoning and fine-grained semantic alignment. To bridge this gap, the authors introduce DermaBench, the first visual question answering (VQA) benchmark tailored for dermatological clinical reasoning. Built upon the DDI dataset, DermaBench encompasses 656 images from 570 patients across all Fitzpatrick skin types (I–VI). Through an expert-driven hierarchical annotation framework, it yields 14,474 structured VQA samples supporting multidimensional assessment—including diagnosis, anatomical location, and morphology—alongside open-ended descriptions and summaries. Notably, it pioneers a multimodal clinical VQA annotation schema incorporating single-choice, multiple-choice, and open-ended question types. Released as metadata compatible with the original license, DermaBench is publicly available on Harvard Dataverse, offering a standardized evaluation benchmark for multimodal models.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) are increasingly important in medical applications; however, their evaluation in dermatology remains limited by datasets that focus primarily on image-level classification tasks such as lesion recognition. While valuable for recognition, such datasets cannot assess the full visual understanding, language grounding, and clinical reasoning capabilities of multimodal models. Visual question answering (VQA) benchmarks are required to evaluate how models interpret dermatological images, reason over fine-grained morphology, and generate clinically meaningful descriptions. We introduce DermaBench, a clinician-annotated dermatology VQA benchmark built on the Diverse Dermatology Images (DDI) dataset. DermaBench comprises 656 clinical images from 570 unique patients spanning Fitzpatrick skin types I-VI. Using a hierarchical annotation schema with 22 main questions (single-choice, multi-choice, and open-ended), expert dermatologists annotated each image for diagnosis, anatomic site, lesion morphology, distribution, surface features, color, and image quality, together with open-ended narrative descriptions and summaries, yielding approximately 14.474 VQA-style annotations. DermaBench is released as a metadata-only dataset to respect upstream licensing and is publicly available at Harvard Dataverse.

Problem

Research questions and friction points this paper is trying to address.

dermatology

visual question answering

vision-language models

clinical reasoning

benchmark dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Question Answering

Dermatology

Vision-Language Models

Clinical Reasoning

Benchmark Dataset

🔎 Similar Papers

No similar papers found.

Authors to Follow