A benchmark multimodal oro-dental dataset for large vision-language models

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current oral AI research is hindered by the scarcity of large-scale, multimodal datasets that reflect clinical complexity. To address this, we introduce OralMed-DB—the first standardized, large-model-oriented dental multimodal benchmark—comprising 4,800 patients and 8,775 clinical examinations, with aligned intraoral images, radiographs (e.g., panoramic and periapical X-rays), and structured electronic health record text. The benchmark supports two clinically relevant tasks: classification of six common dental abnormalities and generation of diagnostic reports. We fine-tune vision-language models—including Qwen-VL (3B and 7B variants)—integrating radiographic analysis and natural language understanding capabilities. Experimental results demonstrate statistically significant improvements over strong baselines and GPT-4o on both tasks, validating OralMed-DB’s efficacy, generalizability, and clinical applicability. This work establishes a foundational infrastructure for advancing practical, deployable AI in dentistry.

Technology Category

Application Category

📝 Abstract
The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned models with their base counterparts and GPT-4o. The fine-tuned models achieved substantial gains over these baselines, validating the dataset and underscoring its effectiveness in advancing AI-driven oro-dental healthcare solutions. The dataset is publicly available, providing an essential resource for future research in AI dentistry.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of large multimodal datasets for oral healthcare AI
Classifying six oro-dental anomalies using vision-language models
Generating complete diagnostic reports from multimodal dental inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal dataset for dental AI
Fine-tuned vision-language models on dental data
Benchmarking models for dental anomaly classification
H
Haoxin Lv
Department of Oral Implantology, Suzhou Doctor Dental Clinic Co. Ltd, Suzhou, 215000, China.
Ijazul Haq
Ijazul Haq
Shanghai Jiaotong University
AINLPMachine LearningLLMsComputer Vision
J
Jin Du
Guangdong Janus Biotechnology Co. Ltd, Guangzhou, 511400, China.
Jiaxin Ma
Jiaxin Ma
OMRON SINIC X Corporation
B
Binnian Zhu
Zhejiang CAS Angels Biotechnology Co. Ltd, Zhejiang, 314200, China.
X
Xiaobing Dang
Guangdong Janus Biotechnology Co. Ltd, Guangzhou, 511400, China.
C
Chaoan Liang
MingZheng Dental Clinic, Guangzhou, 511400, China.
Ruxu Du
Ruxu Du
Guangdong Janus Biotechnology Co. Ltd, Guangzhou, 511400, China.
Y
Yingjie Zhang
Shien-Ming Wu School of Intelligent Manufacturing, South China University of Technology, Guangzhou, 511442, China.
M
Muhammad Saqib
Department of Software Engineering, University of Engineering & Technology, Peshawar, 25000, Pakistan.