Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Oral and maxillofacial radiology faces dual challenges: a critical shortage of specialized professionals and poor generalizability of existing AI models. Current approaches are predominantly single-task and single-modality, requiring large-scale, pixel-level annotations. To address this, we introduce DentVFM—the first dental vision foundation model—trained on DentVista, a self-constructed, million-scale, multi-center, multimodal imaging dataset. DentVFM employs a unified 2D/3D Vision Transformer architecture and integrates self-supervised and weakly supervised pretraining. We further propose DentBench, a comprehensive evaluation benchmark spanning eight dental subspecialties. Extensive experiments demonstrate that DentVFM significantly outperforms state-of-the-art supervised and self-supervised methods across diverse tasks—including disease diagnosis, anatomical landmark localization, and biomarker identification. Notably, its cross-modal reasoning capability enables diagnostic performance surpassing that of experienced clinicians even under modality dropout (e.g., missing CBCT or panoramic views), substantially improving label efficiency and clinical generalizability.

Technology Category

Application Category

📝 Abstract
Oral and maxillofacial radiology plays a vital role in dental healthcare, but radiographic image interpretation is limited by a shortage of trained professionals. While AI approaches have shown promise, existing dental AI systems are restricted by their single-modality focus, task-specific design, and reliance on costly labeled data, hindering their generalization across diverse clinical scenarios. To address these challenges, we introduce DentVFM, the first family of vision foundation models (VFMs) designed for dentistry. DentVFM generates task-agnostic visual representations for a wide range of dental applications and uses self-supervised learning on DentVista, a large curated dental imaging dataset with approximately 1.6 million multi-modal radiographic images from various medical centers. DentVFM includes 2D and 3D variants based on the Vision Transformer (ViT) architecture. To address gaps in dental intelligence assessment and benchmarks, we introduce DentBench, a comprehensive benchmark covering eight dental subspecialties, more diseases, imaging modalities, and a wide geographical distribution. DentVFM shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks, such as disease diagnosis, treatment analysis, biomarker identification, and anatomical landmark detection and segmentation. Experimental results indicate DentVFM significantly outperforms supervised, self-supervised, and weakly supervised baselines, offering superior generalization, label efficiency, and scalability. Additionally, DentVFM enables cross-modality diagnostics, providing more reliable results than experienced dentists in situations where conventional imaging is unavailable. DentVFM sets a new paradigm for dental AI, offering a scalable, adaptable, and label-efficient model to improve intelligent dental healthcare and address critical gaps in global oral healthcare.
Problem

Research questions and friction points this paper is trying to address.

Addressing dental AI limitations in generalization across clinical scenarios
Overcoming shortage of professionals for radiographic image interpretation
Developing adaptable vision models for multi-modal dental applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning on multi-modal dental images
Task-agnostic vision foundation models for dentistry
Comprehensive benchmark covering eight dental subspecialties
🔎 Similar Papers
No similar papers found.
X
Xinrui Huang
Shanghai Jiao Tong University, School of Information Science and Electronic Engineering, Shanghai, 200240, China
F
Fan Xiao
Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Department of Oral Craniomaxillofacial, Shanghai, 200011, China
D
Dongming He
Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Department of Oral Craniomaxillofacial, Shanghai, 200011, China
A
Anqi Gao
Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Department of Oral Craniomaxillofacial, Shanghai, 200011, China
Dandan Li
Dandan Li
BeiJing University of posts and Telecommunication,associate professor
Quantum NonlocalityQuantum AIPrivacy ComputationQuantum Routing
X
Xiaofan Zhang
Shanghai Jiao Tong University, School of Computer Science, Shanghai, 200240, China
Shaoting Zhang
Shaoting Zhang
Shanghai AI Lab; SenseTime Research
Medical Image AnalysisComputer VisionFoundation Models
X
Xudong Wang
Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Department of Oral Craniomaxillofacial, Shanghai, 200011, China; Shanghai Jiao Tong University, College of Stomatology, Shanghai 200125, China; National Center for Stomatology, Shanghai 200011, China; National Clinical Medical Research Center for Oral Diseases, Shanghai 200011, China; Shanghai Key Laboratory of Stomatology, Shanghai 200011, China; Shanghai Research Institute of Stomatology, Shanghai 200011, China; Chinese Ac