Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Oral and maxillofacial radiology faces dual challenges: a critical shortage of specialized professionals and poor generalizability of existing AI models. Current approaches are predominantly single-task and single-modality, requiring large-scale, pixel-level annotations. To address this, we introduce DentVFM—the first dental vision foundation model—trained on DentVista, a self-constructed, million-scale, multi-center, multimodal imaging dataset. DentVFM employs a unified 2D/3D Vision Transformer architecture and integrates self-supervised and weakly supervised pretraining. We further propose DentBench, a comprehensive evaluation benchmark spanning eight dental subspecialties. Extensive experiments demonstrate that DentVFM significantly outperforms state-of-the-art supervised and self-supervised methods across diverse tasks—including disease diagnosis, anatomical landmark localization, and biomarker identification. Notably, its cross-modal reasoning capability enables diagnostic performance surpassing that of experienced clinicians even under modality dropout (e.g., missing CBCT or panoramic views), substantially improving label efficiency and clinical generalizability.

Technology Category

Application Category

📝 Abstract

Oral and maxillofacial radiology plays a vital role in dental healthcare, but radiographic image interpretation is limited by a shortage of trained professionals. While AI approaches have shown promise, existing dental AI systems are restricted by their single-modality focus, task-specific design, and reliance on costly labeled data, hindering their generalization across diverse clinical scenarios. To address these challenges, we introduce DentVFM, the first family of vision foundation models (VFMs) designed for dentistry. DentVFM generates task-agnostic visual representations for a wide range of dental applications and uses self-supervised learning on DentVista, a large curated dental imaging dataset with approximately 1.6 million multi-modal radiographic images from various medical centers. DentVFM includes 2D and 3D variants based on the Vision Transformer (ViT) architecture. To address gaps in dental intelligence assessment and benchmarks, we introduce DentBench, a comprehensive benchmark covering eight dental subspecialties, more diseases, imaging modalities, and a wide geographical distribution. DentVFM shows impressive generalist intelligence, demonstrating robust generalization to diverse dental tasks, such as disease diagnosis, treatment analysis, biomarker identification, and anatomical landmark detection and segmentation. Experimental results indicate DentVFM significantly outperforms supervised, self-supervised, and weakly supervised baselines, offering superior generalization, label efficiency, and scalability. Additionally, DentVFM enables cross-modality diagnostics, providing more reliable results than experienced dentists in situations where conventional imaging is unavailable. DentVFM sets a new paradigm for dental AI, offering a scalable, adaptable, and label-efficient model to improve intelligent dental healthcare and address critical gaps in global oral healthcare.

Problem

Research questions and friction points this paper is trying to address.

Addressing dental AI limitations in generalization across clinical scenarios

Overcoming shortage of professionals for radiographic image interpretation

Developing adaptable vision models for multi-modal dental applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning on multi-modal dental images

Task-agnostic vision foundation models for dentistry

Comprehensive benchmark covering eight dental subspecialties

🔎 Similar Papers

No similar papers found.