A benchmark multimodal oro-dental dataset for large vision-language models

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current oral AI research is hindered by the scarcity of large-scale, multimodal datasets that reflect clinical complexity. To address this, we introduce OralMed-DB—the first standardized, large-model-oriented dental multimodal benchmark—comprising 4,800 patients and 8,775 clinical examinations, with aligned intraoral images, radiographs (e.g., panoramic and periapical X-rays), and structured electronic health record text. The benchmark supports two clinically relevant tasks: classification of six common dental abnormalities and generation of diagnostic reports. We fine-tune vision-language models—including Qwen-VL (3B and 7B variants)—integrating radiographic analysis and natural language understanding capabilities. Experimental results demonstrate statistically significant improvements over strong baselines and GPT-4o on both tasks, validating OralMed-DB’s efficacy, generalizability, and clinical applicability. This work establishes a foundational infrastructure for advancing practical, deployable AI in dentistry.

Technology Category

Application Category

📝 Abstract

The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned models with their base counterparts and GPT-4o. The fine-tuned models achieved substantial gains over these baselines, validating the dataset and underscoring its effectiveness in advancing AI-driven oro-dental healthcare solutions. The dataset is publicly available, providing an essential resource for future research in AI dentistry.

Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of large multimodal datasets for oral healthcare AI

Classifying six oro-dental anomalies using vision-language models

Generating complete diagnostic reports from multimodal dental inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal dataset for dental AI

Fine-tuned vision-language models on dental data

Benchmarking models for dental anomaly classification

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

2024-10-02arXiv.orgCitations: 0

Authors to Follow